How to Split a Sentence into a List of Words in Python

In this Python tutorial, we will discuss how to split a sentence into a list of words in Python. We will explore different ways to split a sentence into a list of words in Python, with examples and output.

There are different ways to split a sentence into a list of words in Python. A few are:

  • Using the split() method
  • Using list comprehension with split()
  • Using regular expressions (re module)
  • Using the nltk library

Split a Sentence into a List of Words in Python

Now, let us see the above methods with examples.

Method-1: Using the split() method

The simplest and most common way to split a sentence into a list of words in Python is by using the split() method available for strings in Python. The split() method splits a string based on a specified delimiter (by default, it is a whitespace).

Example:

sentence = "Python is one of the popular programming languages in the United States of America."

# Splitting the sentence into words
words = sentence.split()

print(words)

Output:

['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America.']

Method-2: Using list comprehension with split()

List comprehension provides a more concise way to create lists in Python. You can use list comprehension along with the split() method to split a sentence into list of words in Python.

Example:

sentence = "Python is one of the popular programming languages in the United States of America."

# Splitting the sentence into words using list comprehension
words = [word for word in sentence.split()]

print(words)

Output:

['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America.']

Method 3: Using regular expressions (re module)

The re module in Python provides functionality to work with regular expressions. The split() function from the re module can be used to split a sentence into a list of words in Python based on a given pattern.

Example:

import re

sentence = "Python is one of the popular programming languages in the United States of America."

# Splitting the sentence into words using regular expressions
words = re.split(r'\W+', sentence)

print(words)

Output:

['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America', '']
Split sentence into list of words python
Split sentence into list of words python

Note that the output contains an empty string at the end, which can be removed using the filter() function.

Example:

import re

sentence = "Python is one of the popular programming languages in the United States of America."

# Splitting the sentence into words using regular expressions
words = re.split(r'\W+', sentence)

# Filtering out the empty strings
words = list(filter(None, words))

print(words)

Output:

['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America']

Method 4: Using the nltk library

The Natural Language Toolkit (nltk) is a popular Python library for working with human language data. It provides a wide range of functionalities for text processing and natural language processing. One of the tools provided by nltk is the word_tokenize() function, which can be used to split a sentence into a list of words in Python.

First, you will need to install the nltk library if you haven’t already:

pip install nltk

Then, you will need to download the ‘punkt’ tokenizer models:

import nltk

nltk.download('punkt')

Now you can use the word_tokenize() function to split a sentence into a list of words.

Example:

import nltk

sentence = "Python is one of the popular programming languages in the United States of America."

# Splitting the sentence into words using nltk.word_tokenize()
words = nltk.word_tokenize(sentence)

print(words)

Output:

['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America', '.']

Note that the output includes punctuation marks as separate tokens. If you want to exclude them, you can use a list comprehension along with the isalnum() method to filter out the non-alphanumeric tokens.

Example:

import nltk

sentence = "Python is one of the popular programming languages in the United States of America."

# Splitting the sentence into words using nltk.word_tokenize()
words = nltk.word_tokenize(sentence)

# Filtering out non-alphanumeric tokens
words = [word for word in words if word.isalnum()]

print(words)

Output:

['Python', 'is', 'one', 'of', 'the', 'popular', 'programming', 'languages', 'in', 'the', 'United', 'States', 'of', 'America']

Conclusion

In this tutorial, we have explored three different ways to split a sentence into a list of words in Python: using the split() method, using list comprehension with split(), and using the re module.

You may also like the following Python string tutorials: