How to Filter DataFrame in Python Pandas | 11 Ways to Filter Pandas DataFrame

Do you want to know how to Filter DataFrame in Pandas? In this Pandas Tutorial, I will tell you 11 different ways to use different inbuilt methods and attributes of Python Pandas Library with practical and realistic examples to filter Python Pandas DataFrame.

To Filter DataFrame in Python Pandas, there are multiple inbuilt methods like filter(), query(), isin(), isnull(), notnull(), etc., and multiple attributes like loc[], iloc[], iat[], at[], etc. Using these predefined methods and attributes, you can perform DataFrame Filtration and get desired output as per your conditions.

Ways to Filter DataFrame in Python Pandas

  1. using Slicing.
  2. using Column Selection or Column Filtering
  3. using filter() method
  4. using iloc[] attribute
  5. using loc[] attribute
  6. using query() method
  7. using Relational Conditions
  8. using iat[] and at[] attributes
  9. using isin() method
  10. use Regular Expression to Filter
  11. using isnull() and notnull()

Let’s See all the methods and attributes step by step to Filter Pandas DataFrame in Python.

1. Pandas DataFrame Filtration using Slicing

We can use slicing to filter data according to the index. We have to provide a range in the slicing with the start index and end index. It will return rows with a given range of indices.

Syntax :

df[ start_index : endindex ]

Let’s see the implementation of code using slicing.

Code :

import pandas as pd 

Employee_data = pd.DataFrame({
    'name':['Oliver','Sophie','William','Henry','Amelia','Grace','George'],
    'age':[23,45,34,23,26,28,39],
    'state': ['California','Texas','New York','Florida','Georgia','Michigan','Florida']
})
filtered_data = Employee_data[2:5]
print(filtered_data)

Output:

      name  age     state
2  William   34  New York
3    Henry   23   Florida
4   Amelia   26   Georgia

Screenshot of Slicing Example with Input and Output.

Pandas DataFrame Filtration using Slicing

Let’s see what happens when we don’t give a value at start_index or end_index with another example.

Code:

import pandas as pd 

Employee_data = pd.DataFrame({
    'name':['Oliver','Sophie','William','Henry','Amelia','Grace','George'],
    'age':[23,45,34,23,26,28,39],
    'state': ['California','Texas','New York','Florida','Georgia','Michigan','Florida']
})
filtered_data = Employee_data[2:]
print(filtered_data)

Output:

      name  age     state
2  William   34  New York
3    Henry   23   Florida
4   Amelia   26   Georgia
5    Grace   28  Michigan
6   George   39   Florida

Note: If we don’t give start_index, it will take 0 as a default start index; if we don’t give end index, it will go up to the end of the sequence.

Pandas DataFrame Filtration Using Slicing without Give Range Value

2. How to Filter DataFrame using Column Selection or Column Filtering

We can use the Column Selection method to filter Python Pandas DataFrame by giving some specific column name based on conditions applied to one or more columns.

Syntax:

df[['column_name','column_name']]

Column Filtering method with a practical example

Code:

import pandas as pd 

Employee_data = pd.DataFrame({
    'name':['Oliver','Sophie','William','Henry','Amelia','Grace','George'],
    'age':[23,45,34,23,26,28,39],
    'state': ['California','Texas','New York','Florida','Georgia','Michigan','Florida'],
    'salary': [5000,9000,8500,15000,21000,4000,7500]
})
filtered_data = Employee_data[['name', 'salary']]
print(filtered_data)

Output:

      name  salary
0   Oliver    5000
1   Sophie    9000
2  William    8500
3    Henry   15000
4   Amelia   21000
5    Grace    4000
6   George    7500

Screenshot of Example: Column Filtering method

Filter DataFrame using Column Selection in Python Pandas

3. Pandas DataFrame Filtration using filter() method in Python

We can Filter DataFrame using the inbuilt method filter(). Using this method is a flexible way to filter data from Pandas DataFrame based on various conditions. Here is a practical example of using the filter() method.

Read: Pandas DataFrame filter() method

Syntax:

df.filter(["column_name",  "column_name" ,...])

Implementing filter() method with a parameter called items.

Code:

import pandas as pd 

Customer_data = {'Name' : ['John', 'Frankline', 'James'] , 
        'Age': [20, 26, 63],
        'City' : ["London", 'Bristol' ,'Cardiff']}

Customer_df = pd.DataFrame(Customer_data)
filtered_Customer_Data = Customer_df.filter(items=["Age","City"])
print(filtered_Customer_Data)

Output:

   Age     City
0   20   London
1   26  Bristol
2   63  Cardiff

Screenshot of the Example with items parameter

Pandas DataFrame Filtration using filter() method in Python

4. Way to Filter DataFrame using iloc[] attribute in Python Pandas

To filter DataFrame in Pandas, we can use iloc(integer-location-based indexing) it helps to filter DataFrame and it takes integer(index) values only as its parameter. It also accepts range as its parameter.
Here are some demonstrative examples of DataFrame filtration using the iloc[] attribute.

Syntax:

i) df.iloc[index_value]
ii) df.iloc[start_index : end_index]

Code :

import pandas as pd 

Employee_data = pd.DataFrame({
    'name':['Oliver','Sophie','William','Henry','Amelia','Grace','George'],
    'age':[23,45,34,23,26,28,39],
    'state': ['California','Texas','New York','Florida','Georgia','Michigan','Florida'],
    'salary': [5000,9000,8500,15000,21000,4000,7500]
})
filtered_data = Employee_data.iloc[3]
print(filtered_data)

Output:

name        Henry
age            23
state     Florida
salary      15000
Name: 3, dtype: object

Screenshot of Example of implementing iloc[] with a single value

Filter DataFrame using iloc attribute in pandas

Let’s see another example of iloc where we will give range inside iloc[]

Code :

import pandas as pd 

Employee_data = pd.DataFrame({
    'name':['Oliver','Sophie','William','Henry','Amelia','Grace','George'],
    'age':[23,45,34,23,26,28,39],
    'state': ['California','Texas','New York','Florida','Georgia','Michigan','Florida'],
    'salary': [5000,9000,8500,15000,21000,4000,7500]
})
filtered_data = Employee_data.iloc[2:6]
print(filtered_data)

Output:

      name  age     state  salary
2  William   34  New York    8500
3    Henry   23   Florida   15000
4   Amelia   26   Georgia   21000
5    Grace   28  Michigan    4000

Image of Filter DataFrame using range inside iloc[]

Filter DataFrame using range in iloc[] attribute in Python Pandas

5. Filter Pandas DataFrame using loc[] attribute in Python Pandas DataFrame

We can use loc(location) to filter DataFrame in Pandas when the index is given manually. loc attribute accepts numeric values as well as characters.

Here, you will see how to set index values manually and how to use the loc function to filter the DataFrame in Pandas.

Syntax:

i) df.loc['index_name'] -> To access the data of single Row
ii) df.loc[['index_name', 'index_name']] -> To access the data of Multiple rows 
iii) df.loc['index' : 'index'] -> You can also give range by giving index_name in loc.

Code :

import pandas as pd

Book_Data = {
    'Title': ['The Great Gatsby', 'To Kill a Mockingbird', '1984'],
    'Author': ['Scott Fitzgerald', 'Harper Lee', 'George Orwell'],
    'Genre': ['Fiction', 'Fiction', 'Dystopian Fiction'],
    'Publication_Year': [1925, 1960, 1949],
    'Pages': [180, 281, 328]
}
df_books = pd.DataFrame(Book_Data, index = ['A','B','C'])
filtered_data = df_books.loc['B']
print(filtered_data)

Output:

Title               To Kill a Mockingbird
Author                         Harper Lee
Genre                             Fiction
Publication_Year                     1960
Pages                                 281
Name: B, dtype: object
Filter DataFrame using loc attribute in pandas

Let’s see another example of adding multiple indexes using loc.

Code:

import pandas as pd

Book_Data = {
    'Title': ['The Great Gatsby', 'To Kill a Mockingbird', '1984'],
    'Author': ['Scott Fitzgerald', 'Harper Lee', 'George Orwell'],
    'Genre': ['Fiction', 'Fiction', 'Dystopian Fiction'],
    'Publication_Year': [1925, 1960, 1949],
    'Pages': [180, 281, 328]
}
df_books = pd.DataFrame(Book_Data, index = ['A','B','C'])
filtered_data = df_books.loc[['A','C']]
print(filtered_data)

Output

              Title            Author              Genre  Publication_Year  Pages
A  The Great Gatsby  Scott Fitzgerald            Fiction              1925    180
C              1984     George Orwell  Dystopian Fiction              1949    328
Filter DataFrame using loc multiple column in Python Pandas

6. Pandas DataFrame Filtration using query()

We can use the query() method to filter the rows of the Python Pandas DataFrame based on the given condition

Here’s an example of the query() method to filter DataFrame and how to give conditions inside query() as parameter.

Syntax:

1) df.query("expression") 
2) df.query("expression1 and expression2 .....") ->(For multiple conditions you can use and,          
                                                                                     or operator)

Code :

import pandas as pd 

Temprature = pd.DataFrame.from_dict({
    "city": ["New York","Los Angeles","Chicago","Phoenix","Dallas","San Francisco"],
    "celsius": [21,28,16,32,38,20],
    'fahrenheit': [25.6, 82.4, 60.8, 89.6, 100.4, 68.0]
})
check_temperature = 20
filtered_data =Temprature.query('celsius >= @check_temperature')
print(filtered_data)

Output:

            city  celsius  fahrenheit
0       New York       21        25.6
1    Los Angeles       28        82.4
3        Phoenix       32        89.6
4         Dallas       38       100.4
5  San Francisco       20        68.0

Note: When you want to access a variable inside the query() method, you must use “@var_name”, otherwise it will search for column_name.

Screenshot of data filtration using query() method in Python pandas.

Pandas DataFrame Filtration using query()

Multiple Conditions Inside Query Method: We can also give multiple conditions inside the query method using and/or operator. You can use & and | operators, it is considered as and, or operators.

Code:

import pandas as pd 

Temprature = pd.DataFrame.from_dict({
    "city": ["New York","Los Angeles","Chicago","Phoenix","Dallas","San Francisco"],
    "celsius": [21,28,16,32,38,20],
    'fahrenheit': [25.6, 82.4, 60.8, 89.6, 100.4, 68.0]
})
check_temperature = 20
filtered_data =Temprature.query('celsius >= @check_temperature and fahrenheit <= 80')
print(filtered_data)

Output:

            city  celsius  fahrenheit
0       New York       21        25.6
5  San Francisco       20        68.0

Screenshot of an example multiple conditions in query() method.

Python Pandas DataFrame Filteration using multiple condition in query()

7. DataFrame Filtration using Relational Conditions in Python Pandas

We can also filter DataFrame without using any inbuilt method or attribute, which is the most common way to achieve the desired output.

For this, we can use Relational Conditions to filter DataFrame in Pandas. We can use all these operators (<, >, ==, <=, >= ) to filter DataFrame with a practical example.

Code:

import pandas as pd

Consumer_data = [
    ['John', 'Doe', 30, 'male'],
    ['Jane', 'Smith', 25, 'female'],
    ['Peter', 'Jones', 40, 'male'],
    ['Susan', 'Williams', 55, 'female']
]
Customer_Dataframe = pd.DataFrame(Consumer_data, columns=['name', 'last_name', 'age','gender'])
filter_data = Customer_Dataframe[Customer_Dataframe['age']>32]
print(filter_data)

Output:

    name last_name  age  gender
2  Peter     Jones   40    male
3  Susan  Williams   55  female

Screenshot of Using Relational Condition to Filter DataFrame.

DataFrame Filtration using Relational Conditions in Python Pandas

This is how you can give multiple conditions to Filter DataFrame in Python Pandas

Code:

import pandas as pd

Consumer_data = [
    ['John', 'Doe', 30, 'male'],
    ['Jane', 'Smith', 25, 'female'],
    ['Peter', 'Jones', 40, 'male'],
    ['Susan', 'Williams', 55, 'female']
]
Customer_Dataframe = pd.DataFrame(Consumer_data, columns=['name', 'last_name', 'age','gender'])
filter_data = Customer_Dataframe[(Customer_Dataframe['age']>32) & (Customer_Dataframe['gender'] == 'female')]
print(filter_data)

Output

    name last_name  age  gender
3  Susan  Williams   55  female

Screenshot of adding multiple conditions in relational conditions.

Filter DataFrame in Python Pandas using multiple Relational Conditions

8. Filter Single item from the DataFrame using iat[] and at[] attributes in Python Pandas

To filter DataFrame, you can use at[] and iat[] attribute which is used to access the single values from the Pandas DataFrame with the help of the address of row and column.

at[] attributeis used to filter DataFrame by label (given index name manually).
iat[] attributeis used to filter DataFrame by integer position (index_value).

Syntax:

1) df.at['label_name' , 'column_name]
2) df.iat['row_index_value', 'column_name']

Implementation of at[] method in Pandas.

import pandas as pd

food_menu = {
    "pizza_type": ['Margerita', "Onion", "Paneer" , "Mashroom"],
    "price": [120, 180 , 150, 220],
    "is_available": [True, False, False, True]
}
food_menu_df = pd.DataFrame(food_menu,index = ['A', 'B', 'C','D'])
filtered_menu = food_menu_df.at['D','pizza_type']
print(filtered_menu)

Output:

Mashroom

Screenshot of at[] example :

Filter DataFrame in Pandas using at[] method

Implementation of example using iat[] method.

import pandas as pd

food_menu = {
    "pizza_type": ['Margerita', "Onion", "Paneer" , "Mashroom"],
    "price": [120, 180 , 150, 220],
    "is_available": [True, False, False, True]
}
food_menu_df = pd.DataFrame(food_menu,index = ['A', 'B', 'C','D'])
filtered_menu = food_menu_df.iat[2,0]
print(filtered_menu)

Output:

Paneer

Screenshot of iat[] example:

Filter DataFrame in Python Pandas using iat[] method

9. Filter DataFrame based on rows using isin() method in Pandas

In Python, Pandas Library has one powerful inbuilt method called isin().

It is used to filter rows of the Dataframe, based on a given condition, you can directly give value inside the isin() function as a parameter and it will filter the DataFrame based on a given value.

Syntax :

df['columm_name'].isin(['value'])

Code :

import pandas as pd

Consumer_data = [
    ['John', 'California', 30, 'male',True],
    ['Jane', 'Texas', 25, 'female',False],
    ['Peter', 'New York', 40, 'male',True],
    ['Susan', 'Florida', 52, 'female',False],
    ['Oliver', 'Illinois', 54, 'female',True],
    ['Sophie', 'Pennsylvania', 36, 'female',False],
    ['William', 'Ohio', 48, 'male',True],
    ['Henry', 'Georgia', 42, 'male',False],
    ['Amelia', 'North Carolina', 55, 'female',True],
    ['George', 'Michigan', 34, 'male',False],
    ['Grace', 'California', 37, 'female',True],
]
Customer_Dataframe = pd.DataFrame(Consumer_data, columns=['name', 'state', 'age','gender','is_active'])
filtered_data = Customer_Dataframe[Customer_Dataframe['gender'].isin(['male'])]
print(filtered_data)

Output:

      name       state  age gender  is_active
0     John  California   30   male       True
2    Peter    New York   40   male       True
6  William        Ohio   48   male       True
7    Henry     Georgia   42   male      False
9   George    Michigan   34   male      False

Implementation Image of DataFrame Filtration using isin() method.

Filter DataFrame using isin method in pandas

We can also filter data using multiple conditions in isin() function.

Code :

import pandas as pd

Consumer_data = [
    ['John', 'California', 30, 'male',True],
    ['Jane', 'Texas', 25, 'female',False],
    ['Peter', 'New York', 40, 'male',True],
    ['Susan', 'Florida', 52, 'female',False],
    ['Oliver', 'Illinois', 54, 'female',True],
    ['Sophie', 'Pennsylvania', 36, 'female',False],
    ['William', 'Ohio', 48, 'male',True],
    ['Henry', 'Georgia', 42, 'male',False],
    ['Amelia', 'North Carolina', 55, 'female',True],
    ['George', 'Michigan', 34, 'male',False],
    ['Grace', 'California', 37, 'female',True],
]
Customer_Dataframe = pd.DataFrame(Consumer_data, columns=['name', 'state', 'age','gender','is_active'])
selected_gender = ["male"]
check_availability = [True]
filter_data = Customer_Dataframe[(Customer_Dataframe['gender'].isin(selected_gender)) & (Customer_Dataframe['is_active'].isin(check_availability))]
print(filter_data)

Output:

      name       state  age gender  is_active
0     John  California   30   male       True
2    Peter    New York   40   male       True
6  William        Ohio   48   male       True

Screenshot of isin() function using multiple conditions to Filter DataFrame in Python Pandas.

Filter DataFrame in Python Pandas using multiple condition in isin method

10. How to use Regular Expression to Filter DataFrame in Pandas

To Filter DataFrame in Pandas, we can use regular expression which is a sequence of characters and is useful to detect the presence and absence of the string by matching the particularly given pattern.

Let’s understand how we can use this powerful method to filter the Python Pandas DataFrame based on the given pattern.

Syntax :

df[df['column_name'].str.contains('pattern')] -> contains() method will check whole string        
                                                                                  and return True if given pattern match the   
                                                                                  string.

Practical example using contains() method of Python library.

Code :

import pandas as pd

data = {'name': ['John', 'George', 'Derik', 'Grace', 'Victor'],
        'age': [24, 32, 23, 47, 25],
        'country': ['Brazil', 'Mexico', 'Cambodia', 'Bolivia', 'India']
        }
df = pd.DataFrame(data)
df_filtered = df[df['name'].str.contains('e')]
print(df_filtered)

Output:

     name  age   country
1  George   32    Mexico
2   Derik   23  Cambodia
3   Grace   47   Bolivia

Screenshot: Example of regular expression using contains().

Filter DataFrame using Regex in Python Pandas

11. Filter Data on the basis of null and notnull values in Python Pandas.

Null is nothing but an empty field that contains no value. In case you need to filter DataFrame based on null and notnull values, there are inbuilt methods of Pandas Library called isnull() and notnull().

  • isnull() method is used to detect missing or empty values in the DataFrame.
  • notnull() method will validate all fields that they are not null.

Let’s understand with a practical example using the isnull() method.

import pandas as pd

Consumer_data = [
    ['John', 'California', 30, 'male',True],
    ['Jane', 'Texas', 25, 'female',False],
    ['Peter', None, 40, 'male',True],
    ['Susan', 'Florida', 52, 'female',False],
    ['Oliver', None, 54, 'female',True],
    ['Sophie', 'Pennsylvania', 36, 'female',False],
    ['William', None, 48, 'male',True],
    ['Henry', 'Georgia', 42, 'male',False],
    ['Amelia', 'North Carolina', 55, 'female',True],
    ['George', None, 34, 'male',False],
    ['Grace', 'California', 37, 'female',True],
]
Customer_Dataframe = pd.DataFrame(Consumer_data, columns=['name', 'state', 'age','gender','is_active'])
filtered_data = Customer_Dataframe[Customer_Dataframe['state'].isnull()]
print(filtered_data)

Output

      name state  age  gender  is_active
2    Peter  None   40    male       True
4   Oliver  None   54  female       True
6  William  None   48    male       True
9   George  None   34    male      False

Screenshot: Example of using the isnull() method

Filter DataFrame Based on isnull in Pandas

Example to use notnull() method to Filter DataFrame in Python Pandas

Code :

import pandas as pd

Consumer_data = [
    ['John', 'California', 30, 'male',True],
    ['Jane', 'Texas', 25, 'female',False],
    ['Peter', None, 40, 'male',True],
    ['Susan', 'Florida', 52, 'female',False],
    ['Oliver', None, 54, 'female',True],
    ['Sophie', 'Pennsylvania', 36, 'female',False],
    ['William', None, 48, 'male',True],
    ['Henry', 'Georgia', 42, 'male',False],
    ['Amelia', 'North Carolina', 55, 'female',True],
    ['George', None, 34, 'male',False],
    ['Grace', 'California', 37, 'female',True],
]
Customer_Dataframe = pd.DataFrame(Consumer_data, columns=['name', 'state', 'age','gender','is_active'])
filtered_data = Customer_Dataframe[Customer_Dataframe['state'].notnull()]
print(filtered_data)

Output:

      name           state  age  gender  is_active
0     John      California   30    male       True
1     Jane           Texas   25  female      False
3    Susan         Florida   52  female      False
5   Sophie    Pennsylvania   36  female      False
7    Henry         Georgia   42    male      False
8   Amelia  North Carolina   55  female       True
10   Grace      California   37  female       True

Screenshot: Example of using notnull() method.

Filter DataFrame based on notnull in pandas

Conclusion

In this tutorial, I have explained how to Filter DataFrame ways in Python Pandas using 11 different methods and attributes like filter(), query(), isin(), isnull(), notnull(), at[], iat[], loc[], iloc[], etc. with some examples.

Each method offers different advantages to filtering data from the Python Pandas DataFrame.

You may like to read: