Pandas count rows with condition in Python [9 different ways]

Do you want to count rows with condition in a dataframe? In this Python tutorial, I will tell you how Pandas count rows with condition in Python using different methods with examples.

To count rows in Pandas with a condition, you can use df.shape or len() for direct counting, df.index for index length, df.apply() with lambda for custom conditions, df.query() for query-based filtering, np.where() for conditional indexing, df.count() for non-null entries, df.groupby().size() for categorical data, and list comprehensions for a Pythonic approach.

Pandas count rows with condition in Python

There are nine different ways to count rows with condition in Python Pandas:

  1. Using DataFrame.shape
  2. Using len()
  3. Using DataFrame.index
  4. Using DataFrame.apply() and Lambda Function
  5. Using DataFrame.query()
  6. Using numpy.where()
  7. Using DataFrame.count()
  8. Using DataFrame.groupby().size()
  9. Using List Comprehensions

Let’s see them one by one using some examples:

1. Pandas count rows with condition using df.shape

This method involves filtering the DataFrame in Python Pandas, based on the condition, and then using the shape attribute, which returns a tuple where the first element is the number of rows.

This is how we can use the df.shape function, in Pandas count rows with condition in Python:

import pandas as pd

df = pd.DataFrame({'State': ['California', 'New York', 'Texas'], 'Sales': [60000, 40000, 80000]})
count_california = df[(df['State'] == 'California') & (df['Sales'] > 50000)].shape[0]
print("California Sales > $50,000:", count_california)

Output:

California Sales > $50,000: 1

Below is a screenshot captured after the code has been implemented in the Pycharm editor.

pandas count rows with condition in Python

2. DataFrame count rows with condition using len() function

We can apply a condition to filter the DataFrame in Python, and then use the len() function to count the number of rows in the filtered DataFrame.

READ:  Module 'tensorflow' has no attribute 'optimizers'

Here is the code in Python count rows with condition in Python, using the len() function:

import pandas as pd

parks_data = pd.DataFrame({
    'Park': ['Great Smoky Mountains', 'Grand Canyon', 'Rocky Mountain', 'Yosemite', 'Yellowstone'],
    'Area_sq_miles': [522, 1902, 265, 1189, 3471]
})
large_parks = len(parks_data[parks_data['Area_sq_miles'] > 1000])
print("Parks > 1000 sq miles:", large_parks)

Output:

Parks > 1000 sq miles: 3

Upon executing the code in Pycharm, the resulting output is showcased in the screenshot below.

dataframe count rows with condition in Python

3. Pandas dataframe count rows with condition using df.index() function

This method will filter the DataFrame based on our condition in Python, and then count the number of rows by getting the length of its index using Pandas df.index() function.

This is how we can use the df.index() function, in Pandas count rows with condition in Python:

import pandas as pd

gdp_data = pd.DataFrame({
    'State': ['California', 'Texas', 'New York', 'Florida', 'Illinois'],
    'GDP_Billion': [3027, 1806, 1690, 1073, 857]
})
rich_states = len(gdp_data[gdp_data['GDP_Billion'] > 1500].index)
print("States with GDP > $1500 Billion:", rich_states)

Output:

States with GDP > $1500 Billion: 3

Here is a screenshot illustrating the output after the code was implemented in the Pycharm editor.

pandas dataframe count rows with condition in Python

4. Python dataframe count rows with condition using df.apply() with lambda function

We can use the apply() method with a lambda function to apply a row-wise condition. Count the rows that meet this condition.

Here is the code to use the apply() with lambda function for Pandas count rows with condition in Python:

import pandas as pd

senators_data = pd.DataFrame({
    'Senator': ['Bernie Sanders', 'Mitch McConnell', 'Kamala Harris', 'Ted Cruz', 'Elizabeth Warren'],
    'Age': [79, 79, 56, 50, 71]
})
senior_senators = len(senators_data[senators_data.apply(lambda x: x['Age'] > 70, axis=1)])
print("Senators > 70 years old:", senior_senators)

Output:

Senators > 70 years old: 3

Following the implementation of the code in the Pycharm editor, the corresponding screenshot is provided below.

python dataframe count rows with condition

5. Pandas count rows using df.query() function

This method filters rows using a string expression as a condition inside the query() method in Python, and then the length of the resulting DataFrame is counted.

READ:  Python Scipy Fcluster

This is how we will use the query() method for Pandas count rows with condition in Python:

import pandas as pd

uni_data = pd.DataFrame({
    'University': ['Harvard', 'MIT', 'Stanford', 'UCLA', 'Columbia'],
    'Students': [31000, 11500, 17000, 45000, 31000]
})
large_universities = len(uni_data.query("Students > 20000"))
print("Universities with > 20,000 students:", large_universities)

Output:

Universities with > 20,000 students: 3

Below, a screenshot is presented which shows the result of implementing the code in the Pycharm editor.

count number of rows based on condition pandas in python

6. Python pandas count rows with condition using np.where() function

This method uses the np.where() function in Python to create a condition-based array of indices, and then sum these to count the rows satisfying the condition.

Here is an example that will tell you how to use the np.where function in Pandas to count rows with conditions in Python:

import pandas as pd
import numpy as np

temp_data = pd.DataFrame({
    'State': ['Alaska', 'Florida', 'Maine', 'Texas', 'Arizona'],
    'Avg_Temp_F': [26, 71, 45, 65, 75]
})
warm_states = np.sum(np.where(temp_data['Avg_Temp_F'] > 60, 1, 0))
print("States with Avg. Temp > 60F:", warm_states)

Output:

States with Avg. Temp > 60F: 3

Upon executing the code in Pycharm, the resulting output is captured in the screenshot below.

python dataframe count rows with condition pandas

7. Count rows Pandas using df.count() function

After filtering the DataFrame based on the condition, we can use count() on a specific column to get the number of non-NA/null entries, which is the row count.

This is the use of the df.count() function to count rows with condition in Python Pandas:

import pandas as pd

company_data = pd.DataFrame({
    'Company': ['Apple', 'Google', 'Microsoft', 'Amazon', 'Facebook'],
    'Revenue_Billion': [274.5, 182.5, 143, 386, 86]
})
complete_revenue_data = company_data['Revenue_Billion'].count()
print("Companies with Complete Revenue Data:", complete_revenue_data)

Output:

Companies with Complete Revenue Data: 5

Displayed below is a screenshot that illustrates the output after the code has been executed in the Pycharm editor.

count rows in pandas with condition in Python

8. Count number of rows based on condition Pandas using df.groupby().size() function

This method is useful for categorical data. Group the data by categories and use size() function to count the rows in each group.

READ:  Matplotlib scatter plot legend

This is the way we can use the df.groupby().size() function, for Pandas count rows with condition in Python:

import pandas as pd

movie_data = pd.DataFrame({
    'Movie': ['Avengers', 'Titanic', 'Inception', 'The Godfather', 'Joker'],
    'Genre': ['Action', 'Romance', 'Sci-Fi', 'Drama', 'Thriller']
})
movies_by_genre = movie_data.groupby('Genre').size()
print("Number of Movies by Genre:\n", movies_by_genre)

Output:

Number of Movies by Genre:
 Genre
Action      1
Drama       1
Romance     1
Sci-Fi      1
Thriller    1
dtype: int64

The following screenshot depicts the results post-implementation of the code in the Pycharm editor.

pandas dataframe count rows with condition Python

9. Python dataframe count rows with condition using list comprehension

We can implement a list comprehension to iterate over each row, apply the condition, and sum up the number of times the condition is met.

Here is an instance, for Pandas count rows with condition in Python using list comprehension:

import pandas as pd

river_data = pd.DataFrame({
    'River': ['Mississippi', 'Missouri', 'Yukon', 'Rio Grande', 'Arkansas'],
    'Length_miles': [2340, 2341, 1980, 1885, 1460]
})
long_rivers = sum([1 for _, row in river_data.iterrows() if row['Length_miles'] > 2000])
print("Rivers > 2000 miles:", long_rivers)

Output:

Rivers > 2000 miles: 2

Shown below is a screenshot captured after the code was implemented in the Pycharm editor.

count rows in dataframe with condition in python

Conclusion

Understanding the various methods for Pandas count rows with condition in Python is essential for efficient data analysis. Techniques like df.shape, len(), df.index, df.apply() with Lambda Function, df.query(), np.where(), df.count(), df.groupby().size(), and List Comprehensions.

Each method offers distinct advantages for data analysis tasks.

You may also like to read: