Working with data in Python often requires identifying and extracting unique values from arrays or lists. Recently, I was analyzing a large dataset of customer purchase history across different US states, and I needed to identify the distinct products that were purchased. That’s when I turned to NumPy’s unique function, an efficient tool that saves time and simplifies code.
In this tutorial, I’ll show you how to use NumPy’s unique function to find distinct elements in arrays efficiently. I’ll cover different use cases and practical examples that you can apply to your projects.
NumPy’s Unique Function
The NumPy unique function in Python is used to find and return the unique elements from an array. When called on an array, it returns another array containing only the distinct values, with duplicates removed.
This function is extremely useful when:
- Analyzing categorical data
- Removing duplicate entries
- Counting distinct values
- Creating lookup tables
Let’s dive into how we can use this function effectively.
Read NumPy Concatenate vs Append in Python
Basic Usage of np.unique()
The simplest way to use NumPy’s unique function is to call it on a one-dimensional Python array. Here’s how:
import numpy as np
# Sample array of US states where sales occurred
states = np.array(['California', 'Texas', 'Florida', 'California', 'New York', 'Texas', 'Florida'])
# Get unique states
unique_states = np.unique(states)
print(unique_states)Output:
['California' 'Florida' 'New York' 'Texas']I executed the above example code and added a screenshot below.

Notice how the function not only removed duplicates but also sorted the values alphabetically. This is the default behavior of np.unique().
Check out NumPy Linspace in Python
Get Both Unique Values and Their Indices
Sometimes you need to know not just the unique values but also where they appear in the original Python array:
import numpy as np
# Sales amounts for different regions (in thousands of dollars)
sales = np.array([120, 85, 120, 150, 85, 200, 120])
# Get unique values and their indices
unique_values, indices = np.unique(sales, return_index=True)
print("Unique values:", unique_values)
print("Indices of first occurrence:", indices)Output:
Unique values: [85 120 150 200]
Indices of first occurrence: [1 0 3 5]I executed the above example code and added a screenshot below.

This tells us that the first occurrence of 85 is at index 1, the first 120 at index 0, and so on.
Read NumPy Read CSV with Header in Python
Count Occurrences with np.unique()
One of my favorite features is the ability to count how many times each unique value appears:
import numpy as np
# Customer ratings (1-5) for a product
ratings = np.array([5, 4, 3, 5, 5, 4, 2, 3, 5, 1, 4, 5])
# Get unique values and their counts
unique_ratings, counts = np.unique(ratings, return_counts=True)
# Create a frequency table
for rating, count in zip(unique_ratings, counts):
print(f"Rating {rating}: {count} customers")Output:
Rating 1: 1 customers
Rating 2: 1 customers
Rating 3: 2 customers
Rating 4: 3 customers
Rating 5: 5 customersI executed the above example code and added a screenshot below.

This makes it incredibly easy to build frequency distributions and analyze data patterns.
Find Unique Rows in a 2D Array
NumPy’s unique function also works with multi-dimensional arrays in Python. Here’s how to find unique rows:
import numpy as np
# Customer data: [age_group, income_level, purchase_frequency]
customer_segments = np.array([
[1, 3, 2],
[2, 2, 1],
[1, 3, 2], # Duplicate of the first row
[3, 1, 3],
[2, 2, 1], # Duplicate of the second row
[3, 3, 1]
])
# Get unique rows
unique_segments = np.unique(customer_segments, axis=0)
print(unique_segments)Output:
[[1 3 2]
[2 2 1]
[3 1 3]
[3 3 1]]The axis=0 parameter tells NumPy to look for unique rows rather than unique elements.
Check out NumPy Zeros in Python
Get the Inverse Mapping
Sometimes you need to reconstruct the original array from the unique values:
import numpy as np
# Product IDs from customer orders
order_products = np.array([101, 203, 101, 305, 203, 101])
# Get unique values and inverse mapping
unique_products, inverse_indices = np.unique(order_products, return_inverse=True)
print("Unique products:", unique_products)
print("Inverse mapping:", inverse_indices)
# Recreate the original array
reconstructed = unique_products[inverse_indices]
print("Reconstructed array:", reconstructed)Output:
Unique products: [101 203 305]
Inverse mapping: [0 1 0 2 1 0]
Reconstructed array: [101 203 101 305 203 101]The inverse indices tell us which unique value corresponds to each position in the original array.
Read NumPy: Create a NaN Array in Python
Use np.unique() with Structured Arrays
NumPy’s unique function can also work with structured arrays, which is useful for more complex data:
import numpy as np
# Customer data as a structured array
customers = np.array([
('John', 'New York', 35),
('Mary', 'California', 28),
('John', 'New York', 35), # Duplicate
('Steve', 'Texas', 42),
('Mary', 'California', 28) # Duplicate
], dtype=[('name', 'U10'), ('state', 'U15'), ('age', 'i4')])
# Get unique customers
unique_customers = np.unique(customers)
for customer in unique_customers:
print(f"Name: {customer[0]}, State: {customer[1]}, Age: {customer[2]}")Output:
Name: John, State: New York, Age: 35
Name: Mary, State: California, Age: 28
Name: Steve, State: Texas, Age: 42This is extremely useful when working with tables or records where each entry has multiple fields.
Performance Considerations
The NumPy unique function is highly optimized and much faster than using Python’s built-in methods, especially for large arrays. Here’s a quick comparison:
import numpy as np
import time
# Create a large array with duplicates
large_array = np.random.randint(0, 1000, size=1000000)
# Time NumPy's unique
start = time.time()
np_unique = np.unique(large_array)
np_time = time.time() - start
print(f"NumPy unique time: {np_time:.4f} seconds")
# Time Python's set
start = time.time()
py_unique = sorted(set(large_array))
py_time = time.time() - start
print(f"Python set time: {py_time:.4f} seconds")
print(f"NumPy is {py_time/np_time:.1f}x faster")The performance difference becomes even more significant as array sizes increase, making NumPy’s unique function the clear choice for data analysis tasks.
Check out Create an Empty Array using NumPy in Python
When to Use np.unique() vs. pandas.unique()
If you’re working with pandas DataFrames, you might wonder whether to use NumPy’s unique or pandas’ unique method. Here’s a quick guideline:
- Use
np.unique()when: - Working with NumPy arrays
- You need sorted results
- You need additional return values like counts or indices
- Use
pandas.unique()when: - Working with pandas Series
- You want to preserve the original order
- You’re dealing with missing values (NaN)
I recommend testing both in your specific scenario to determine which one works best for your needs.
In my daily data analysis tasks, the NumPy unique function has become an indispensable tool. Whether I’m analyzing customer segments, cleaning datasets, or preparing data for machine learning models, this function consistently helps me work more efficiently.
The next time you need to find distinct values in your data, remember that NumPy’s unique function offers both simplicity and powerful options to meet your needs. Experiment with the different parameters and use cases I’ve shared to get the most out of this versatile function.
You may read:

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.