Pandas Unique Values in Column without NaN in Python (4 Methods)

Do you want to know different methods to get unique values in the dataframe column? I will tell you four different ways to Pandas unique values in column without NaN in Python, in this Pandas blog with some illustrative examples.

In Python, we can use functions like df[‘Column name’].dropna().unique() to get unique values in a column without NaN, or we can use the set() type casting method like set(df[‘column name’].dropna()), to see the count of the unique values we can use the df[‘column name’].value_counts() function, and we can also use the groupby() with the agg() method in Pandas to get unique values from a column in the dataframe.

As a Python developer working with dataframes in Python, I got a requirement of only listing the unique values from a column in a Pandas dataframe but that should not include the NaN values.

Here is the list of all the methods that I got to know, while I was researching, and hence can be used to get unique values in a dataframe Python:

  1. dropna() with unique() function
  2. set() type casting with dropna() function
  3. value_counts() function
  4. groupby() with agg() method

Let’s see them all one by one using some demonstrative examples:

1. How to find unique values and ignore NaN in Pandas with dropna() and unique() function

The unique() function in Pandas returns a NumPy array, that contains unique values from the specified column of the dataframe in Python.

READ:  How to Print the Characters in a String Separated By Space

But if the dataframe column contains the NaN values (Missing Values) then first we need to remove them using the dropna() function in Pandas. Then, we will use the unique() function.

Let’s take an instance that will use both unique() and dropna() functions to find the unique values from a dataframe column in Python:

import pandas as pd

data_states = {'State': ['California', 'Texas', 'New York', 'Florida', 'California', 'Texas', None, 'Florida', 'Arizona', None],
               'Population_Millions': [39.51, 28.99, 19.45, 21.48, 39.51, 28.99, None, 21.48, 7.28, None],
               'Capital': ['Sacramento', 'Austin', 'Albany', 'Tallahassee', 'Sacramento', 'Austin', None, 'Tallahassee', 'Phoenix', None]}
df_states = pd.DataFrame(data_states)

unique_states_method1 = df_states['State'].dropna().unique()
print("Unique States without NaN:\n", unique_states_method1)

Output:

Unique States without NaN:
 ['California' 'Texas' 'New York' 'Florida' 'Arizona']

After executing the source code mentioned above, in the Pycharm editor, I have taken the following output screenshot:

Pandas Unique Values in Column without NaN in Python

2. Pandas get unique values in column without NaN using set() function with the dropna() function

As we have seen in the previous method, the dropna() function is used to remove the missing values (NaN) from a dataframe in Python.

We can use the set() type casting function, as a Python set can only contain unique values inside it. So, when we convert the dataframe column values into a set in Python, remove the duplicate values.

Here is an example to find the unique values in a Pandas dataframe using set() with dropna() function:

import pandas as pd

data_parks = {'National_Park': ['Yosemite', 'Grand Canyon', 'Yellowstone', None, 'Yosemite', 'Grand Canyon', 'Zion', 'Yellowstone', None],
              'Visitors_Millions': [4.5, 6.2, 4.1, None, 4.5, 6.2, 3.2, 4.1, None],
              'Location': ['California', 'Arizona', 'Wyoming', None, 'California', 'Arizona', 'Utah', 'Wyoming', None]}
df_parks = pd.DataFrame(data_parks)

unique_parks = set(df_parks['National_Park'].dropna())
print("Unique National Parks without NaN:\n", unique_parks)

Output: Here as we are using the set() function then, we will get a set of unique values from the column of a dataframe in Python:

Unique National Parks without NaN:
 {'Yellowstone', 'Grand Canyon', 'Zion', 'Yosemite'}

After executing the code in Pycharm, one can see the output in the below screenshot.

pandas get unique values in column without nan in Python

3. Pandas unique values without NaN using the value_counts() function

We use the value_counts() function in Pandas to get both the unique values and their frequencies.

READ:  PyTorch nn linear + Examples

The value_counts() function only provides a list of unique values in descending order so that the first element is the most frequently occurring element. By default, this eliminates the NaN values from the provided series in Python.

Let’s see an example of how the value_counts() function can be used to get unique values in a dataframe in Pandas:

import pandas as pd

data_cars = {'Car_Brand': ['Ford', 'Chevrolet', 'Toyota', 'Ford', 'Chevrolet', 'Toyota', 'Honda', None, 'Tesla', None],
             'Sales_Thousands': [250, 200, 180, 240, 220, 210, 150, None, 50, None],
             'Country_of_Origin': ['USA', 'USA', 'Japan', 'USA', 'USA', 'Japan', 'Japan', None, 'USA', None]}
df_cars = pd.DataFrame(data_cars)

unique_cars = df_cars['Car_Brand'].value_counts()
print("Unique Car Brands without NaN:\n", unique_cars)

Output:

Unique Car Brands without NaN:
 Car_Brand
Ford         2
Chevrolet    2
Toyota       2
Honda        1
Tesla        1
Name: count, dtype: int64

A screenshot is mentioned below, after implementing the code in the Pycharm editor.

pandas unique values in dataframe column without nan in Python

To print as a list we use the down code where we use the notna() methods to filter the NaN values and tolist() to create a list in Python.

import pandas as pd

data_cars = {'Car_Brand': ['Ford', 'Chevrolet', 'Toyota', 'Ford', 'Chevrolet', 'Toyota', 'Honda', None, 'Tesla', None],
             'Sales_Thousands': [250, 200, 180, 240, 220, 210, 150, None, 50, None],
             'Country_of_Origin': ['USA', 'USA', 'Japan', 'USA', 'USA', 'Japan', 'Japan', None, 'USA', None]}
df_cars = pd.DataFrame(data_cars)

unique_cars = df_cars['Car_Brand'].value_counts(dropna=False).index[df_cars['Car_Brand'].value_counts(dropna=False).index.notna()].tolist()
print("Unique Car Brands without NaN:\n", unique_cars)

Output:

Unique Car Brands without NaN:
 ['Ford', 'Chevrolet', 'Toyota', 'Honda', 'Tesla']

The following screenshot was taken after the above code was implemented in the Pycharm editor.

pandas unique values in column except nan in Python

4. Pandas Unique Values in Column without NaN in Python using the groupby() and agg() method

The groupby() method in Python Pandas is used to group the dataframe by the specified column and then we can use the agg() method with a custom function to drop NaN values and get only the unique values.

READ:  Login system in Python Django

Here is an example, to find the unique values in a dataframe column using the groupby() and agg() methods:

import pandas as pd

data_tech_companies = {'Tech_Company': ['Apple', 'Microsoft', 'Google', 'Apple', 'Microsoft', None, 'Amazon', 'Google', 'Facebook', None],
                       'Market_Cap_Billions': [2370, 2300, 1800, 2370, 2300, None, 1600, 1800, 870, None],
                       'Industry': ['Tech', 'Techn', 'Techn', 'Techn', 'Techn', None, 'E-commerce', 'Tech', 'Social Media', None]}
df_tech_companies = pd.DataFrame(data_tech_companies)

unique_companies = df_tech_companies.groupby('Tech_Company').agg(unique_companies=('Tech_Company', lambda x: x.dropna().unique()))
print("Unique Tech Companies without NaN:\n", unique_companies)

Output:

Unique Tech Companies without NaN:
              unique_companies
Tech_Company                 
Amazon               [Amazon]
Apple                 [Apple]
Facebook           [Facebook]
Google               [Google]
Microsoft         [Microsoft]

Upon running the code in Pycharm, the resulting output is displayed in the screenshot below.

Get unique values from a column in Pandas DataFrame without NaN in Python

Conclusion

Understanding different methods in Pandas to get unique values from a column without NaN in Python like unique() with dropna() function, set() type casting with dropna() function, value_counts() function, and groupby with agg method can help one to showcase their dataframe series in presenting manner.

I hope I have explained all the methods well enough with the help of some illustrative examples.

Here is the list of the some articles that you may like to read: