How To Filter NumPy 2D Array By Condition In Python

When working with data in Python, I often need to extract specific values from arrays based on certain conditions. NumPy makes this process incredibly efficient, especially when dealing with 2D arrays (matrices).

In this article, I will show you 6 practical methods to filter 2D arrays by condition in NumPy. These techniques have saved me countless hours in my data analysis projects.

Let’s get in!

Table of Contents

Filtering in NumPy

Filtering is the process of extracting elements from an array that meet specific criteria. With NumPy, we can perform this filtering efficiently on large datasets.

When working with 2D arrays (like spreadsheets or images), filtering becomes even more powerful as we can extract rows, columns, or individual elements based on complex conditions.

Read np.diff() Function in Python

Filter a NumPy 2D Array by Condition in Python

Now, I will explain how to filter a NumPy 2D array by condition in Python.

Method 1 – Use Boolean Indexing

Boolean indexing is the easiest way to filter a 2D array in Python NumPy. It works by creating a mask of True/False values and using it to select elements.

Let’s create a simple 2D array representing sales data for different store locations:

import numpy as np

# Sales data for different store locations (rows) across months (columns)
sales_data = np.array([
    [5000, 7200, 6100, 8500],  # New York
    [4200, 5100, 4800, 6300],  # Chicago
    [9100, 8700, 9200, 10500], # Los Angeles
    [3800, 4100, 3900, 4600]   # Boston
])

# Filter stores with sales > 8000 in any month
high_sales_mask = np.any(sales_data > 8000, axis=1)
high_performing_stores = sales_data[high_sales_mask]

print("Stores with sales exceeding $8000 in any month:")
print(high_performing_stores)

Output:

Stores with sales exceeding $8000 in any month:
[[ 5000  7200  6100  8500]
 [ 9100  8700  9200 10500]]

You can see the output in the screenshot below.

In this example, I created a boolean mask that identifies stores (rows) where at least one month had sales over $8000. The axis=1 parameter tells NumPy to check along each row.

Check out Replace Values in NumPy Array by Index in Python

Method 2 – Use np.where()

Python np.where() function is another useful tool for filtering. It works like an if-else statement across the entire array.

import numpy as np

# Customer ratings for different products (rows) from different reviewers (columns)
ratings = np.array([
    [4.2, 3.8, 4.5, 4.1],  # Product A
    [3.1, 3.2, 2.9, 3.3],  # Product B
    [4.8, 4.7, 4.9, 4.6],  # Product C
    [3.7, 3.9, 3.5, 3.8]   # Product D
])

# Find indices of highly-rated products (average rating > 4.0)
high_rated_indices = np.where(np.mean(ratings, axis=1) > 4.0)[0]
high_rated_products = ratings[high_rated_indices]

print("Products with average rating above 4.0:")
print(high_rated_products)

Output:

Products with average rating above 4.0:
[[4.2 3.8 4.5 4.1]
 [4.8 4.7 4.9 4.6]]

You can see the output in the screenshot below.

The np.where() function returns the indices where the condition is true. I then use these indices to filter the original array.

Read np.add.at() Function in Python

Method 3 – Use np.extract()

Python np.extract() function is designed specifically for filtering arrays based on conditions.

import numpy as np

# Temperature readings (°F) for different cities (rows) across seasons (columns)
temperatures = np.array([
    [72, 85, 68, 45],  # Miami
    [45, 65, 55, 30],  # New York
    [68, 75, 65, 55],  # Los Angeles
    [35, 60, 50, 25]   # Chicago
])

# Extract all temperature readings above 70°F
condition = temperatures > 70
hot_temps = np.extract(condition, temperatures)

print("Temperature readings above 70°F:")
print(hot_temps)

Output:

Temperature readings above 70°F:
[72 85 75]

You can see the output in the screenshot below.

One thing to note: np.extract() returns a flattened array rather than maintaining the 2D structure.

Check out NumPy Array to a String in Python

Method 4 – Filter with np.ma.masked_array

NumPy’s masked arrays are perfect when you want to keep your array’s structure but ignore certain values.

import numpy as np

# Stock price changes (%) for different companies (rows) across quarters (columns)
stock_changes = np.array([
    [2.5, -1.3, 3.2, 0.8],   # Company A
    [-0.7, -2.1, -0.5, -1.2], # Company B
    [1.8, 2.3, -0.4, 1.5],    # Company C
    [3.1, -0.8, 2.7, 4.2]     # Company D
])

# Mask negative values
masked_data = np.ma.masked_array(stock_changes, stock_changes < 0)

print("Stock price changes (negative values masked):")
print(masked_data)

Output:

Stock price changes (negative values masked):
[[2.5 -- 3.2 0.8]
 [-- -- -- --]
 [1.8 2.3 -- 1.5]
 [3.1 -- 2.7 4.2]]

This approach keeps your 2D structure intact while masking values that don’t meet your condition.

Read NumPy Reverse Array in Python

Method 5 – Use np.compress()

Python np.compress() function lets you filter along a specific axis based on a condition.

import numpy as np

# Website traffic data (visitors per day) for different sites (rows) across weeks (columns)
traffic_data = np.array([
    [1200, 1500, 1100, 1300],  # Site A
    [500, 450, 550, 480],      # Site B 
    [2200, 2500, 2300, 2400],  # Site C
    [850, 900, 830, 870]       # Site D
])

# Filter to keep only high-traffic sites (>1000 visitors on average)
row_condition = np.mean(traffic_data, axis=1) > 1000
high_traffic_sites = np.compress(row_condition, traffic_data, axis=0)

print("High-traffic websites (>1000 daily visitors on average):")
print(high_traffic_sites)

Output:

High-traffic websites (>1000 daily visitors on average):
[[1200 1500 1100 1300]
 [2200 2500 2300 2400]]

The axis=0 parameter tells np.compress() to filter rows based on our condition.

Check out NumPy Array to List in Python

Method 6 – Combine Multiple Conditions

In real-world scenarios, you often need to filter based on multiple conditions. NumPy makes this simple using logical operators.

import numpy as np

# Product data: [price, rating, stock]
products = np.array([
    [49.99, 4.2, 15],  # Product A
    [29.99, 3.8, 5],   # Product B
    [99.99, 4.7, 20],  # Product C
    [19.99, 4.0, 3],   # Product D
    [79.99, 4.5, 8]    # Product E
])

# Find products that are:
# 1. Highly rated (>4.0)
# 2. In stock (>5 units)
# 3. Under $80
condition1 = products[:, 1] > 4.0  # rating > 4.0
condition2 = products[:, 2] > 5    # stock > 5
condition3 = products[:, 0] < 80   # price < $80

# Combine all conditions
final_condition = condition1 & condition2 & condition3
filtered_products = products[final_condition]

print("Products meeting all criteria:")
print(filtered_products)

Output:

Products meeting all criteria:
[[49.99  4.2  15.  ]]

This approach is useful because you can combine as many conditions as needed using logical operators (& for AND, | for OR, ~ for NOT).

Read np.savetxt() Function in Python

Performance Considerations

When working with large datasets, the method you choose can significantly impact performance:

Boolean indexing is generally the fastest method for simple conditions.
np.where() can be more efficient when you need the indices as well as the values.
np.ma.masked_array has some overhead, but is useful when you need to keep the original array structure.

For very large arrays, consider using NumPy’s optimized functions like np.sum(), np.mean(), etc., before filtering to reduce computation time.

I’ve found that Boolean indexing handles most of my filtering needs efficiently, but it’s good to have these other methods in your toolkit for specific scenarios.

NumPy’s potent filtering capabilities can significantly simplify your data analysis workflows. Whether you’re working with financial data, scientific measurements, or any other type of numerical information, these techniques will help you extract exactly what you need from your 2D arrays.

If you have any questions or suggestions, please leave them in the comments below.

Other Python articles you may also like:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/