Do you want to count rows with condition in a dataframe? In this Python tutorial, I will tell you how Pandas count rows with condition in Python using different methods with examples.
To count rows in Pandas with a condition, you can use df.shape or len() for direct counting, df.index for index length, df.apply() with lambda for custom conditions, df.query() for query-based filtering, np.where() for conditional indexing, df.count() for non-null entries, df.groupby().size() for categorical data, and list comprehensions for a Pythonic approach.
Pandas count rows with condition in Python
There are nine different ways to count rows with condition in Python Pandas:
- Using DataFrame.shape
- Using len()
- Using DataFrame.index
- Using DataFrame.apply() and Lambda Function
- Using DataFrame.query()
- Using numpy.where()
- Using DataFrame.count()
- Using DataFrame.groupby().size()
- Using List Comprehensions
Let’s see them one by one using some examples:
1. Pandas count rows with condition using df.shape
This method involves filtering the DataFrame in Python Pandas, based on the condition, and then using the shape attribute, which returns a tuple where the first element is the number of rows.
This is how we can use the df.shape function, in Pandas count rows with condition in Python:
import pandas as pd
df = pd.DataFrame({'State': ['California', 'New York', 'Texas'], 'Sales': [60000, 40000, 80000]})
count_california = df[(df['State'] == 'California') & (df['Sales'] > 50000)].shape[0]
print("California Sales > $50,000:", count_california)
Output:
California Sales > $50,000: 1
Below is a screenshot captured after the code has been implemented in the Pycharm editor.
2. DataFrame count rows with condition using len() function
We can apply a condition to filter the DataFrame in Python, and then use the len() function to count the number of rows in the filtered DataFrame.
Here is the code in Python count rows with condition in Python, using the len() function:
import pandas as pd
parks_data = pd.DataFrame({
'Park': ['Great Smoky Mountains', 'Grand Canyon', 'Rocky Mountain', 'Yosemite', 'Yellowstone'],
'Area_sq_miles': [522, 1902, 265, 1189, 3471]
})
large_parks = len(parks_data[parks_data['Area_sq_miles'] > 1000])
print("Parks > 1000 sq miles:", large_parks)
Output:
Parks > 1000 sq miles: 3
Upon executing the code in Pycharm, the resulting output is showcased in the screenshot below.
3. Pandas dataframe count rows with condition using df.index() function
This method will filter the DataFrame based on our condition in Python, and then count the number of rows by getting the length of its index using Pandas df.index() function.
This is how we can use the df.index() function, in Pandas count rows with condition in Python:
import pandas as pd
gdp_data = pd.DataFrame({
'State': ['California', 'Texas', 'New York', 'Florida', 'Illinois'],
'GDP_Billion': [3027, 1806, 1690, 1073, 857]
})
rich_states = len(gdp_data[gdp_data['GDP_Billion'] > 1500].index)
print("States with GDP > $1500 Billion:", rich_states)
Output:
States with GDP > $1500 Billion: 3
Here is a screenshot illustrating the output after the code was implemented in the Pycharm editor.
4. Python dataframe count rows with condition using df.apply() with lambda function
We can use the apply() method with a lambda function to apply a row-wise condition. Count the rows that meet this condition.
Here is the code to use the apply() with lambda function for Pandas count rows with condition in Python:
import pandas as pd
senators_data = pd.DataFrame({
'Senator': ['Bernie Sanders', 'Mitch McConnell', 'Kamala Harris', 'Ted Cruz', 'Elizabeth Warren'],
'Age': [79, 79, 56, 50, 71]
})
senior_senators = len(senators_data[senators_data.apply(lambda x: x['Age'] > 70, axis=1)])
print("Senators > 70 years old:", senior_senators)
Output:
Senators > 70 years old: 3
Following the implementation of the code in the Pycharm editor, the corresponding screenshot is provided below.
5. Pandas count rows using df.query() function
This method filters rows using a string expression as a condition inside the query() method in Python, and then the length of the resulting DataFrame is counted.
This is how we will use the query() method for Pandas count rows with condition in Python:
import pandas as pd
uni_data = pd.DataFrame({
'University': ['Harvard', 'MIT', 'Stanford', 'UCLA', 'Columbia'],
'Students': [31000, 11500, 17000, 45000, 31000]
})
large_universities = len(uni_data.query("Students > 20000"))
print("Universities with > 20,000 students:", large_universities)
Output:
Universities with > 20,000 students: 3
Below, a screenshot is presented which shows the result of implementing the code in the Pycharm editor.
6. Python pandas count rows with condition using np.where() function
This method uses the np.where() function in Python to create a condition-based array of indices, and then sum these to count the rows satisfying the condition.
Here is an example that will tell you how to use the np.where function in Pandas to count rows with conditions in Python:
import pandas as pd
import numpy as np
temp_data = pd.DataFrame({
'State': ['Alaska', 'Florida', 'Maine', 'Texas', 'Arizona'],
'Avg_Temp_F': [26, 71, 45, 65, 75]
})
warm_states = np.sum(np.where(temp_data['Avg_Temp_F'] > 60, 1, 0))
print("States with Avg. Temp > 60F:", warm_states)
Output:
States with Avg. Temp > 60F: 3
Upon executing the code in Pycharm, the resulting output is captured in the screenshot below.
7. Count rows Pandas using df.count() function
After filtering the DataFrame based on the condition, we can use count() on a specific column to get the number of non-NA/null entries, which is the row count.
This is the use of the df.count() function to count rows with condition in Python Pandas:
import pandas as pd
company_data = pd.DataFrame({
'Company': ['Apple', 'Google', 'Microsoft', 'Amazon', 'Facebook'],
'Revenue_Billion': [274.5, 182.5, 143, 386, 86]
})
complete_revenue_data = company_data['Revenue_Billion'].count()
print("Companies with Complete Revenue Data:", complete_revenue_data)
Output:
Companies with Complete Revenue Data: 5
Displayed below is a screenshot that illustrates the output after the code has been executed in the Pycharm editor.
8. Count number of rows based on condition Pandas using df.groupby().size() function
This method is useful for categorical data. Group the data by categories and use size() function to count the rows in each group.
This is the way we can use the df.groupby().size() function, for Pandas count rows with condition in Python:
import pandas as pd
movie_data = pd.DataFrame({
'Movie': ['Avengers', 'Titanic', 'Inception', 'The Godfather', 'Joker'],
'Genre': ['Action', 'Romance', 'Sci-Fi', 'Drama', 'Thriller']
})
movies_by_genre = movie_data.groupby('Genre').size()
print("Number of Movies by Genre:\n", movies_by_genre)
Output:
Number of Movies by Genre:
Genre
Action 1
Drama 1
Romance 1
Sci-Fi 1
Thriller 1
dtype: int64
The following screenshot depicts the results post-implementation of the code in the Pycharm editor.
9. Python dataframe count rows with condition using list comprehension
We can implement a list comprehension to iterate over each row, apply the condition, and sum up the number of times the condition is met.
Here is an instance, for Pandas count rows with condition in Python using list comprehension:
import pandas as pd
river_data = pd.DataFrame({
'River': ['Mississippi', 'Missouri', 'Yukon', 'Rio Grande', 'Arkansas'],
'Length_miles': [2340, 2341, 1980, 1885, 1460]
})
long_rivers = sum([1 for _, row in river_data.iterrows() if row['Length_miles'] > 2000])
print("Rivers > 2000 miles:", long_rivers)
Output:
Rivers > 2000 miles: 2
Shown below is a screenshot captured after the code was implemented in the Pycharm editor.
Conclusion
Understanding the various methods for Pandas count rows with condition in Python is essential for efficient data analysis. Techniques like df.shape, len(), df.index, df.apply() with Lambda Function, df.query(), np.where(), df.count(), df.groupby().size(), and List Comprehensions.
Each method offers distinct advantages for data analysis tasks.
You may also like to read:
- Pandas replace multiple values in Python
- Pandas iterrows update value in Python
- Pandas find duplicates in Python
- How to add an empty column to a DataFrame in Pandas Python
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.