How to Split Sentence into Words in Python[3 Ways]

Do you want to split sentence into words in Python? In this Python tutorial, you will learn multiple ways to split a string into a list of words in Python.

While working on a Python project, I needed to get all the data into a text file, which contained the names of all the employees.

When I was looking for a solution, I found 3 different ways of getting sentences to a list of words in Python.

3 Ways to Convert a Sentence into a List of Words in Python

Let’s understand the scenario more clearly. Suppose there is a string that contains some data like this,

emp_names = "Hannah, Ian, Alice, Dave, Jack, George, Bob, Carol, Eve, Frank"

And now I need to make it into the list of separate words.

Let’s see all the approaches one by one with practical examples to split sentences into words in Python

How to Split Sentences into Words in Python using the Split() Method

First, we will use the split() method of Python, which is used to separate the string by giving a required separator in the parameter, and it will convert it into the list.

The split() method is the easiest and preferred way to execute this type of task because it is specially made to convert a sentence to words in the list in Python.

READ:  How to Reverse a List in Python [Using While and For Loop]

Syntax

"string".split(separator)
  • “string”.split(separator): When you don’t provide a separator as a parameter, the function will take white space by default.

Let’s execute an example to split the sentence into words in Python.

emp_names = "Hannah, Ian, Alice, Dave, Jack, George, Bob, Carol, Eve, Frank"

list_of_names = emp_names.split(", ")

print("List of Names:", list_of_names)
How to Convert a Sentence into a List of Words in Python

In the above code, we have data of employee names as a string separated by a comma. We need to get all the names on the list.

So we are using the split() method like this, emp_names.split(“, “). emp_names is a variable that contains a string and uses a comma as a parameter to separate all the words.

Convert String to List of Words in Python using Regex

Here, we will use the re.split() method, which is a built-in method of re module in Python. Also, we will use \W, which is used to target non-words elements like white space, special character symbols, or punctuation.

The re.split() method will also convert a string to a list of words, but it will take two parameters: a separator and a string.

Syntax

re.split("seperator", "string")
import re

sentence = "Python is one of the popular programming languages in the United States of America"

words = re.split(r'\W+', sentence)

print(words)
How to Split Sentence into Words in Python using regex

In the above code, we have a sentence, and we need to get all the words into the list in Python. So we use a split() method like this: re.split(r’\W+’, sentence), so \W will target all the spaces, and the split() method will add the words to the list in Python.

How to Convert a Sentence into a List of Words using nltk.tokenize()

Now, we will use the tokenized() method of the nltk library, which you have to install separately using this command,

pip install nltk

Generally, we use this module for text processing and some advanced concepts in Python. This library has a method named nltk.word_tokenize(), which will be useful for converting statements to words.

READ:  How to Iterate through a Dictionary in Python with Index? [4 Methods]

Syntax

var_name  = nltk.word_tokenize("string")
  • nltk.word_tokenize(): Before using this method, you have to import nltk in your program.

Let’s understand how to split sentences into words in Python using nltk.word_tokenize() method.

import nltk

# nltk.download('punkt')
 
string = "United States of America"
list_of_words = nltk.word_tokenize(string)
print(list_of_words)
Convert String to List of Words in Python using nltk.tokenize()

In the above code, first, we import the nltk module. Then you can see that we commented this part nltk.download(‘punkt’) because when we will our code for the first time, this line should uncommented so that it will download some required things, and after that, you can commen that line.

Then, we use the nltk.tokenize() method, like this: nltk.word_tokenize(string), which takes white space as a separator and makes the list of words.

Conclusion

In this Python article, you learned how to split sentence into words in Python in three different ways in Python using the split() method, then using regex, and then nltk.word_tokenize() method in Python.

We explained all these methods individually, using practical examples and realistic scenarios, so you can use them in the correct situation.