Pandas Unique Values in Column Without NaN in Python

When working with data in Python, I often encounter datasets with missing values. These values, represented as NaN (Not a Number) in pandas, can sometimes interfere with my analysis, especially when I need to identify unique values in a column.

Recently, I was analyzing a customer dataset for a US retail chain where I needed to extract unique store locations without including the NaN values that were cluttering my results.

In this article, I’ll show you several practical methods to get unique values from a pandas DataFrame column while excluding any missing (NaN) values.

This Tutorial Covers:

Get Pandas Unique Values in Column Without NaN in Python

Now, I will explain how to get Pandas’ unique values in a column without NaN in Python.

Method 1: Use dropna() with unique()

The easiest approach to get unique values without NaN is to combine two pandas functions: dropna() and unique() in Python.

import pandas as pd
import numpy as np

# Creating a sample DataFrame with NaN values
data = {
    'State': ['California', 'New York', np.nan, 'Texas', 'California', np.nan, 'Florida']
}

df = pd.DataFrame(data)

# Getting unique values without NaN
unique_states = df['State'].dropna().unique()

print(unique_states)

Output:

['California' 'New York' 'Texas' 'Florida']

You can see the output in the screenshot below.

In this example, I first removed all NaN values from the ‘State’ column using dropna() and then applied unique() to get only the distinct values. This simple two-step process gives you exactly what you need, unique values with no NaN entries.

Check out Python Pandas Write to Excel

Method 2: Use pandas Series.value_counts()

Another effective approach is using the Python value_counts() method that automatically excludes NaN values by default.

# Using value_counts()
unique_states_with_counts = df['State'].value_counts()

print(unique_states_with_counts)

Output:

State
California    2
New York      1
Texas         1
Florida       1
Name: count, dtype: int64

You can see the output in the screenshot below.

The value_counts() method gives us an added advantage, not only does it exclude NaN values, but it also provides the frequency of each value. This is particularly useful when I need to know both unique values and their counts.

To get just the unique values from this result, I can access the index:

unique_states = df['State'].value_counts().index.tolist()
print(unique_states)

Output:

['California', 'New York', 'Texas', 'Florida']

Method 3: Use notnull() with unique()

A third approach combines Python notnull() with unique() method to filter out NaN values:

# Using notnull with unique
unique_states = df.loc[df['State'].notnull(), 'State'].unique()

print(unique_states)

Output:

['California' 'New York' 'Texas' 'Florida']

You can see the output in the screenshot below.

This method first creates a boolean mask using notnull() to identify non-NaN values, then applies unique() to get the distinct values. I find this approach particularly readable and explicit in its intention.

Check out Create Plots Using Pandas crosstab() in Python

Method 4: Use Python’s set() function

If you prefer using Python’s built-in functions, you can convert the non-NaN values to a set:

# Using Python's set() (first dropping NaN values)
unique_states = set(df['State'].dropna())

print(unique_states)

Output:

{'California', 'Florida', 'New York', 'Texas'}

Note that when using set(), the order of elements is not preserved. If the order matters for your analysis, you should stick with one of the previous methods.

Read the Drop the Header Row of Pandas DataFrame

Method 5: Use nunique() to count unique non-NaN values

If you only need the count of unique non-NaN values (rather than the values themselves), pandas provides the convenient nunique() method in Python:

# Count unique non-NaN values
unique_count = df['State'].nunique()

print(f"Number of unique states: {unique_count}")

Output:

Number of unique states: 4

This is extremely useful for quick data profiling when you just need to know how many unique values exist in a column.

Real-World Example: Analyze US Sales Data

Let’s look at a more comprehensive example using a sales dataset:

# Creating a more realistic dataset
sales_data = {
    'State': ['California', 'New York', np.nan, 'Texas', 'California', np.nan, 'Florida'],
    'Sales': [5000, 4200, 3800, 4500, 5600, 3200, 4100],
    'Category': ['Electronics', 'Furniture', np.nan, 'Electronics', 'Clothing', 'Furniture', np.nan]
}

sales_df = pd.DataFrame(sales_data)

# Get unique product categories without NaN
unique_categories = sales_df['Category'].dropna().unique()

# Get states with sales > 4000 (excluding states with NaN)
high_sales_states = sales_df.loc[(sales_df['Sales'] > 4000) & (sales_df['State'].notnull()), 'State'].unique()

print("Unique Product Categories:")
print(unique_categories)
print("\nStates with High Sales:")
print(high_sales_states)

Output:

Unique Product Categories:
['Electronics' 'Furniture' 'Clothing']

States with High Sales:
['California' 'New York' 'Texas']

In this example, I’m analyzing a retail dataset where both ‘State’ and ‘Category’ columns contain NaN values. I’ve extracted unique product categories excluding NaNs, and also identified states with high sales while filtering out any NaN state values.

Check out Pandas Get Index of Row in Python

Performance Considerations

When working with large datasets, performance becomes crucial. Here’s a quick comparison of the methods we’ve discussed:

dropna().unique() – Simple and readable, good for most use cases
value_counts().index – Efficient when you need counts too
loc[notnull(), ...].unique() – Very explicit but slightly more verbose
set(dropna()) – Can be faster for very large datasets due to the set’s O(1) lookup time

For most everyday use cases, the first method (dropna().unique()) offers the best balance of readability and performance.

Read Create Pandas Crosstab Percentage in Python

Conclusion

Handling NaN values properly is an important skill when working with pandas DataFrames. The methods I’ve shared in this article have served me well in countless real-world data analysis projects.

To summarize:

Use df['column'].dropna().unique() for an easy approach
Consider df['column'].value_counts() when you need frequency information
Try df.loc[df['column'].notnull(), 'column'].unique() for explicit filtering
Use set(df['column'].dropna()) when working with very large datasets
Apply df['column'].nunique() when you only need the count

I also covered a real-world example to analyze US sales data and some performance considerations.

Other Python articles you may also like:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

How to Get Pandas Unique Values in Column Without NaN in Python

Get Pandas Unique Values in Column Without NaN in Python

Method 1: Use dropna() with unique()

Method 2: Use pandas Series.value_counts()

Method 3: Use notnull() with unique()

Method 4: Use Python’s set() function

Method 5: Use nunique() to count unique non-NaN values

Real-World Example: Analyze US Sales Data

Performance Considerations

Conclusion

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends