Filter DataFrame in Python Pandas

Recently, I was working on a data analysis project where I needed to filter a large dataset to focus on specific information. Pandas, a useful Python library, makes this task incredibly easy with its filtering capabilities.

In this article, I will share various methods to filter DataFrames in Pandas, from basic boolean filtering to advanced techniques using query() method and more complex conditions.

So let’s get in!

Basic Boolean Filtering in Pandas

The simplest way to filter a DataFrame is by using boolean conditions. This approach is intuitive and works for most basic filtering needs.

Let’s create a sample DataFrame with US sales data:

import pandas as pd

# Sample US sales data
data = {
    'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard', 'Mouse', 'Laptop', 'Phone'],
    'State': ['California', 'New York', 'Texas', 'Florida', 'California', 'Texas', 'New York', 'Florida'],
    'Price': [1200, 800, 300, 250, 100, 25, 1500, 900],
    'Units': [5, 10, 8, 12, 30, 25, 3, 7]
}

df = pd.DataFrame(data)
print(df)

To filter this DataFrame for products that cost more than $500:

expensive_products = df[df['Price'] > 500]
print(expensive_products)

You can also combine multiple conditions using & (AND) and | (OR):

# Products more than $500 in California
california_expensive = df[(df['Price'] > 500) & (df['State'] == 'California')]
print(california_expensive)

1. Use the query() Method

Python query() method provides a more readable way to filter DataFrames, especially for complex conditions:

# Same filter as above but using query()
california_expensive = df.query('Price > 500 and State == "California"')
print(california_expensive)

Output:

  Product       State  Price  Units
0  Laptop  California   1200      5

You can refer to the screenshot below to see the output.

pandas filter

For expressions involving multiple conditions, query() offers better readability and often cleaner code.

Read Python Pandas Write to Excel

2. Filter with isin() Method

When you need to filter based on multiple possible values, the isin() method in Python is perfect:

# Filter products from specific states
coastal_states = df[df['State'].isin(['California', 'New York', 'Florida'])]
print(coastal_states)

Output:

    Product       State  Price  Units
0    Laptop  California   1200      5
1     Phone    New York    800     10
3   Monitor     Florida    250     12
4  Keyboard  California    100     30
6    Laptop    New York   1500      3
7     Phone     Florida    900      7

You can refer to the screenshot below to see the output.

pandas filter dataframe

This is much cleaner than using multiple OR conditions.

Check out Create Plots Using Pandas crosstab() in Python

3. Filter for Text Patterns with str Methods

Pandas provides string methods for filtering text data:

# Filter states that start with 'C'
c_states = df[df['State'].str.startswith('C')]
print(c_states)

# Filter states that contain 'a'
a_states = df[df['State'].str.contains('a')]
print(a_states)

Output:

    Product       State  Price  Units
0    Laptop  California   1200      5
4  Keyboard  California    100     30
    Product       State  Price  Units
0    Laptop  California   1200      5
2    Tablet       Texas    300      8
3   Monitor     Florida    250     12
4  Keyboard  California    100     30
5     Mouse       Texas     25     25
7     Phone     Florida    900      7

You can refer to the screenshot below to see the output.

filter pandas

This example shows how to filter DataFrame rows based on whether text in a column starts with or contains a specific substring using string methods.

Read Drop the Header Row of Pandas DataFrame

4. Filter Null and Non-Null Values

Let’s add some null values to our DataFrame:

import numpy as np

df2 = df.copy()
df2.loc[2, 'Price'] = np.nan
df2.loc[5, 'Units'] = np.nan

# Filter rows with missing price
missing_price = df2[df2['Price'].isna()]
print(missing_price)

# Filter rows with all values present
complete_rows = df2.dropna()
print(complete_rows)

This example demonstrates how to filter rows in a DataFrame that have missing (NaN) values or only keep rows where all values are present using isna() and dropna() methods.

5. Filter with between() Method

For range filtering, the between() method is very useful:

# Filter products with price between $200 and $1000
mid_range = df[df['Price'].between(200, 1000)]
print(mid_range)

The between() method allows easy filtering of rows with values within a specified numeric range.

Read np.where in Pandas Python

6. Filter with loc and iloc

The loc and iloc indexers provide powerful ways to filter both rows and columns:

# Using loc for label-based filtering
california_data = df.loc[df['State'] == 'California']
print(california_data)

# Using loc to filter both rows and columns
specific_data = df.loc[df['Price'] > 500, ['Product', 'Price']]
print(specific_data)

The loc and iloc indexers let you filter rows and columns precisely using labels (loc) or integer positions (iloc).

Create Custom Filter Functions

For complex filtering logic, you can create custom functions:

def is_high_value_sale(row):
    return (row['Price'] * row['Units']) > 5000

# Filter high-value sales
high_value_sales = df[df.apply(is_high_value_sale, axis=1)]
print(high_value_sales)

Filter using datetime in Pandas

Let’s add a date column to demonstrate filtering with dates:

import datetime

df3 = df.copy()
df3['Date'] = pd.date_range(start='2023-01-01', periods=8)
print(df3)

# Filter sales after January 3rd
recent_sales = df3[df3['Date'] > '2023-01-03']
print(recent_sales)

# Filter sales within a date range
january_first_week = df3[(df3['Date'] >= '2023-01-01') & (df3['Date'] <= '2023-01-07')]
print(january_first_week)

Read Pandas GroupBy Without Aggregation Function in Python

Filtering with nlargest() and nsmallest()

When you need the top or bottom values in a DataFrame:

# Get the 3 most expensive products
top_expensive = df.nlargest(3, 'Price')
print(top_expensive)

# Get the 2 products with fewest units
least_units = df.nsmallest(2, 'Units')
print(least_units)

Both methods work great – simple boolean filtering is quick and intuitive, while methods like query() and isin() make complex filters more readable. The custom function approach gives you unlimited flexibility for complex business logic.

I hope you found this article helpful. I explained the query() method, isin() method, str method, between() method, filtering with loc and iloc, largest(), and smallest() methods.

Other Python articles you may also like:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.