NumPy Unique Function in Python

Working with data in Python often requires identifying and extracting unique values from arrays or lists. Recently, I was analyzing a large dataset of customer purchase history across different US states, and I needed to identify the distinct products that were purchased. That’s when I turned to NumPy’s unique function, an efficient tool that saves time and simplifies code.

In this tutorial, I’ll show you how to use NumPy’s unique function to find distinct elements in arrays efficiently. I’ll cover different use cases and practical examples that you can apply to your projects.

NumPy’s Unique Function

The NumPy unique function in Python is used to find and return the unique elements from an array. When called on an array, it returns another array containing only the distinct values, with duplicates removed.

This function is extremely useful when:

  • Analyzing categorical data
  • Removing duplicate entries
  • Counting distinct values
  • Creating lookup tables

Let’s dive into how we can use this function effectively.

Read NumPy Concatenate vs Append in Python

Basic Usage of np.unique()

The simplest way to use NumPy’s unique function is to call it on a one-dimensional Python array. Here’s how:

import numpy as np

# Sample array of US states where sales occurred
states = np.array(['California', 'Texas', 'Florida', 'California', 'New York', 'Texas', 'Florida'])

# Get unique states
unique_states = np.unique(states)
print(unique_states)

Output:

['California' 'Florida' 'New York' 'Texas']

I executed the above example code and added a screenshot below.

numpy unique

Notice how the function not only removed duplicates but also sorted the values alphabetically. This is the default behavior of np.unique().

Check out NumPy Linspace in Python

Get Both Unique Values and Their Indices

Sometimes you need to know not just the unique values but also where they appear in the original Python array:

import numpy as np

# Sales amounts for different regions (in thousands of dollars)
sales = np.array([120, 85, 120, 150, 85, 200, 120])

# Get unique values and their indices
unique_values, indices = np.unique(sales, return_index=True)

print("Unique values:", unique_values)
print("Indices of first occurrence:", indices)

Output:

Unique values: [85 120 150 200]
Indices of first occurrence: [1 0 3 5]

I executed the above example code and added a screenshot below.

np unique

This tells us that the first occurrence of 85 is at index 1, the first 120 at index 0, and so on.

Read NumPy Read CSV with Header in Python

Count Occurrences with np.unique()

One of my favorite features is the ability to count how many times each unique value appears:

import numpy as np

# Customer ratings (1-5) for a product
ratings = np.array([5, 4, 3, 5, 5, 4, 2, 3, 5, 1, 4, 5])

# Get unique values and their counts
unique_ratings, counts = np.unique(ratings, return_counts=True)

# Create a frequency table
for rating, count in zip(unique_ratings, counts):
    print(f"Rating {rating}: {count} customers")

Output:

Rating 1: 1 customers
Rating 2: 1 customers
Rating 3: 2 customers
Rating 4: 3 customers
Rating 5: 5 customers

I executed the above example code and added a screenshot below.

np.unique

This makes it incredibly easy to build frequency distributions and analyze data patterns.

Find Unique Rows in a 2D Array

NumPy’s unique function also works with multi-dimensional arrays in Python. Here’s how to find unique rows:

import numpy as np

# Customer data: [age_group, income_level, purchase_frequency]
customer_segments = np.array([
    [1, 3, 2],
    [2, 2, 1],
    [1, 3, 2],  # Duplicate of the first row
    [3, 1, 3],
    [2, 2, 1],  # Duplicate of the second row
    [3, 3, 1]
])

# Get unique rows
unique_segments = np.unique(customer_segments, axis=0)
print(unique_segments)

Output:

[[1 3 2]
 [2 2 1]
 [3 1 3]
 [3 3 1]]

The axis=0 parameter tells NumPy to look for unique rows rather than unique elements.

Check out NumPy Zeros in Python

Get the Inverse Mapping

Sometimes you need to reconstruct the original array from the unique values:

import numpy as np

# Product IDs from customer orders
order_products = np.array([101, 203, 101, 305, 203, 101])

# Get unique values and inverse mapping
unique_products, inverse_indices = np.unique(order_products, return_inverse=True)

print("Unique products:", unique_products)
print("Inverse mapping:", inverse_indices)

# Recreate the original array
reconstructed = unique_products[inverse_indices]
print("Reconstructed array:", reconstructed)

Output:

Unique products: [101 203 305]
Inverse mapping: [0 1 0 2 1 0]
Reconstructed array: [101 203 101 305 203 101]

The inverse indices tell us which unique value corresponds to each position in the original array.

Read NumPy: Create a NaN Array in Python

Use np.unique() with Structured Arrays

NumPy’s unique function can also work with structured arrays, which is useful for more complex data:

import numpy as np

# Customer data as a structured array
customers = np.array([
    ('John', 'New York', 35),
    ('Mary', 'California', 28),
    ('John', 'New York', 35),  # Duplicate
    ('Steve', 'Texas', 42),
    ('Mary', 'California', 28)  # Duplicate
], dtype=[('name', 'U10'), ('state', 'U15'), ('age', 'i4')])

# Get unique customers
unique_customers = np.unique(customers)

for customer in unique_customers:
    print(f"Name: {customer[0]}, State: {customer[1]}, Age: {customer[2]}")

Output:

Name: John, State: New York, Age: 35
Name: Mary, State: California, Age: 28
Name: Steve, State: Texas, Age: 42

This is extremely useful when working with tables or records where each entry has multiple fields.

Performance Considerations

The NumPy unique function is highly optimized and much faster than using Python’s built-in methods, especially for large arrays. Here’s a quick comparison:

import numpy as np
import time

# Create a large array with duplicates
large_array = np.random.randint(0, 1000, size=1000000)

# Time NumPy's unique
start = time.time()
np_unique = np.unique(large_array)
np_time = time.time() - start
print(f"NumPy unique time: {np_time:.4f} seconds")

# Time Python's set
start = time.time()
py_unique = sorted(set(large_array))
py_time = time.time() - start
print(f"Python set time: {py_time:.4f} seconds")
print(f"NumPy is {py_time/np_time:.1f}x faster")

The performance difference becomes even more significant as array sizes increase, making NumPy’s unique function the clear choice for data analysis tasks.

Check out Create an Empty Array using NumPy in Python

When to Use np.unique() vs. pandas.unique()

If you’re working with pandas DataFrames, you might wonder whether to use NumPy’s unique or pandas’ unique method. Here’s a quick guideline:

  • Use np.unique() when:
  • Working with NumPy arrays
  • You need sorted results
  • You need additional return values like counts or indices
  • Use pandas.unique() when:
  • Working with pandas Series
  • You want to preserve the original order
  • You’re dealing with missing values (NaN)

I recommend testing both in your specific scenario to determine which one works best for your needs.

In my daily data analysis tasks, the NumPy unique function has become an indispensable tool. Whether I’m analyzing customer segments, cleaning datasets, or preparing data for machine learning models, this function consistently helps me work more efficiently.

The next time you need to find distinct values in your data, remember that NumPy’s unique function offers both simplicity and powerful options to meet your needs. Experiment with the different parameters and use cases I’ve shared to get the most out of this versatile function.

You may read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.