In this Python tutorial, we will see how to remove duplicate elements from a Python list with illustrative examples of different methods.
As we all know, Python lists are indexed, so they have elements with the same values. For example:
list_with_duplicates = ['sam', 'tom', 'rob', 'betty', 'sam', 'rob']
We can see, in the above Python list, we have duplicate elements with different index numbers. We want to clean up this list by removing the duplicates. Let’s check different ways to remove these duplicates.
Removing duplicate elements from a List in Python
There are several different ways to remove duplicate elements from a Python list. They are:
- Using the Set Data Structure
- Using List Comprehension
- Using the dict.fromkeys() method
- Using Pandas Library
Method-1: Removing duplicates from a Python list using set()
One of the most straightforward and efficient ways to remove duplicates from a list is by using a set. A set is a built-in Python data structure that, much like a mathematical set, cannot contain duplicate elements.
Here, we will convert our list into a set and then again convert that set into a list.
In this scenario, Let’s assume we have a list containing the names of states in the USA. However, our list has some duplicate entries.
usa_states = ['Texas',
'California',
'New York',
'Texas',
'Florida',
'California',
'Georgia',
'New York',
'Texas']
usa_states_set = set(usa_states)
usa_states = list(usa_states_set)
print(usa_states)
The output:
['New York', 'Florida', 'California', 'Texas', 'Georgia']
Note: The order of the list changes because the set is not ordered in Python.
This way we can use the set database in Python to remove duplicate elements in a Python list.
Method-2: Removing duplicates from a Python list using List comprehension
Python List comprehension is a concise way to create lists. We can use list comprehension along with if x in conditions to remove duplicates while preserving order:
In this example, we have a Python list of popular sports in the USA, but it has some duplicates. Our goal is to eliminate these duplicates using list comprehension:
usa_sports = ['Basketball',
'Baseball',
'Football',
'Basketball',
'Hockey',
'Football',
'Soccer',
'Basketball']
sports = [sport for i, sport in enumerate(usa_sports) if usa_sports.index(sport) == i]
print(sports)
The output is:
['Basketball', 'Baseball', 'Football', 'Hockey', 'Soccer']
As we can see, all the duplicate elements have been removed and the order is also maintained.
This way we can use list comprehension in Python to remove duplicates in a List.
Method-3: Removing duplicates from a Python list using dict.fromkeys()
Another method of removing duplicates, which also maintains the order of the elements, is by utilizing Python’s dictionaries. Dictionaries can’t have duplicate keys, so we can leverage this to our advantage.
We will use dict.fromkeys() method to convert the elements of the Python list into Python dictionary keys, and then again convert them into a Python list using list().
Here, we have a list involving the names of popular fast-food chains in the USA:
usa_fast_food = ['McDonalds',
'Subway',
'Starbucks',
'McDonalds',
'KFC',
'Subway',
'Burger King',
'Starbucks']
usa_fast_food = list(dict.fromkeys(usa_fast_food))
print(usa_fast_food)
This method will eliminate duplicates and maintain the order. The output is:
['McDonalds', 'Subway', 'Starbucks', 'KFC', 'Burger King']
This way we can use dict.fromkeys() in the Python list to remove duplicates.
Method-4: Removing duplicates from a Python list using the Pandas library
Lastly, if our list was part of a more complex data structure like a DataFrame, we could use the drop_duplicates() method provided by Python Pandas. The drop_duplicates() will remove all the duplicate values from DataFrames in Python.
Here we have, a list of public libraries in different cities of the USA. Our list contains some duplicate entries and we aim to eliminate these duplicates using the Pandas library.
Firstly, we will import the Pandas and convert the list into Python DataFrame, then we will remove all the duplicates using drop_duplicates(), and then at last, we will convert the Python DataFrame into a list using tolist().
import pandas as pd
usa_libraries = ['New York Public Library',
'Los Angeles Public Library',
'New York Public Library',
'Boston Public Library',
'Los Angeles Public Library']
df = pd.DataFrame(usa_libraries, columns=['Library'])
df = df.drop_duplicates()
usa_libraries = df['Library'].tolist()
print(usa_libraries)
The output:
['New York Public Library', 'Los Angeles Public Library', 'Boston Public Library']
Read: How to loop through a list in Python
This way we can use the Pandas library in Python to remove duplicates in a list.
Conclusion:
In conclusion, we have learned that Python offers several powerful techniques to remove duplicates from a list like set, dict.fromkeys(), list comprehension, and Pandas library. Each with its pros and cons. We must see our necessities and choice wisely.
You may like to read the following articles:
- Python Append List to another List without Brackets
- How to Sum Elements in List in Python using For Loop
- How to write a list to CSV in Python
- How to remove the last element from the Python list
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.