In this Python tutorial, we will discuss how to split a sentence into a list of words in Python. We will explore different ways to split a sentence into a list of words in Python, with examples and output.
There are different ways to split a sentence into a list of words in Python. A few are:
- Using the split() method
- Using list comprehension with split()
- Using regular expressions (re module)
- Using the nltk library
Split a Sentence into a List of Words in Python
Now, let us see the above methods with examples.
Method-1: Using the split() method
The simplest and most common way to split a sentence into a list of words in Python is by using the split()
method available for strings in Python. The split()
method splits a string based on a specified delimiter (by default, it is a whitespace).
Example:
sentence = "Python is one of the popular programming languages in the United States of America."
# Splitting the sentence into words
words = sentence.split()
print(words)
Output:
['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America.']
Method-2: Using list comprehension with split()
List comprehension provides a more concise way to create lists in Python. You can use list comprehension along with the split()
method to split a sentence into list of words in Python.
Example:
sentence = "Python is one of the popular programming languages in the United States of America."
# Splitting the sentence into words using list comprehension
words = [word for word in sentence.split()]
print(words)
Output:
['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America.']
Method 3: Using regular expressions (re module)
The re
module in Python provides functionality to work with regular expressions. The split()
function from the re
module can be used to split a sentence into a list of words in Python based on a given pattern.
Example:
import re
sentence = "Python is one of the popular programming languages in the United States of America."
# Splitting the sentence into words using regular expressions
words = re.split(r'\W+', sentence)
print(words)
Output:
['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America', '']
Note that the output contains an empty string at the end, which can be removed using the filter()
function.
Example:
import re
sentence = "Python is one of the popular programming languages in the United States of America."
# Splitting the sentence into words using regular expressions
words = re.split(r'\W+', sentence)
# Filtering out the empty strings
words = list(filter(None, words))
print(words)
Output:
['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America']
Method 4: Using the nltk library
The Natural Language Toolkit (nltk) is a popular Python library for working with human language data. It provides a wide range of functionalities for text processing and natural language processing. One of the tools provided by nltk is the word_tokenize()
function, which can be used to split a sentence into a list of words in Python.
First, you will need to install the nltk library if you haven’t already:
pip install nltk
Then, you will need to download the ‘punkt’ tokenizer models:
import nltk
nltk.download('punkt')
Now you can use the word_tokenize()
function to split a sentence into a list of words.
Example:
import nltk
sentence = "Python is one of the popular programming languages in the United States of America."
# Splitting the sentence into words using nltk.word_tokenize()
words = nltk.word_tokenize(sentence)
print(words)
Output:
['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America', '.']
Note that the output includes punctuation marks as separate tokens. If you want to exclude them, you can use a list comprehension along with the isalnum()
method to filter out the non-alphanumeric tokens.
Example:
import nltk
sentence = "Python is one of the popular programming languages in the United States of America."
# Splitting the sentence into words using nltk.word_tokenize()
words = nltk.word_tokenize(sentence)
# Filtering out non-alphanumeric tokens
words = [word for word in words if word.isalnum()]
print(words)
Output:
['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America']
Conclusion
In this tutorial, we have explored three different ways to split a sentence into a list of words in Python: using the split()
method, using list comprehension with split()
, and using the re
module.
You may also like the following Python string tutorials:
- How to Split a String Using Regex in Python
- Split a String into an Array in Python
- How to split a string into equal half in Python?
- How to split a string by index in Python
- Create a String of Same Character in Python
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.