In this Pandas tutorial, I will explain the Pandas groupby without aggregation function in Python, with some illustrative examples.
To understand Pandas groupby without aggregation in Python, it’s about dividing a dataset into groups based on some criteria, without reducing these groups to a single statistical summary. This technique allows for detailed analysis and manipulation of each subgroup, enabling a more nuanced exploration of the dataset’s structure and characteristics.
Pandas groupby with aggregation function in Python
The groupby method in Pandas, combined with an aggregate function, is a powerful tool for data analysis in Python. It enables us to group data by certain columns and then perform various aggregate operations on the grouped data. Here’s how it works:
import pandas as pd
# Sample DataFrame
data = {
'State': ['California', 'Texas', 'New York', 'California', 'Texas', 'New York', 'California'],
'Product': ['Product A', 'Product B', 'Product A', 'Product C', 'Product B', 'Product C', 'Product A'],
'Sales': [200, 150, 400, 300, 250, 350, 500]
}
df = pd.DataFrame(data)
# Group by 'State' and 'Product', then sum 'Sales'
grouped = df.groupby(['State', 'Product']).agg('sum')
print(grouped)
Output:
Sales
State Product
California Product A 700
Product C 300
New York Product A 400
Product C 350
Texas Product B 400
Below is a screenshot showcasing the result, following the implementation of the code in the Pycharm editor.
Pandas groupby without aggregation function in Python
Using Pandas groupby without aggregation function in Python allows us to segment our dataset into groups. Still, it doesn’t collapse these groups into a single value per group (which is what aggregation does).
Let’s see some of the use cases and examples for this:
1. group by without aggregate Pandas for exploring data
Pandas groupby without aggregation function in Python is a great way to explore and understand our data. For instance, we can group data by a specific category and then examine each group individually:
import pandas as pd
df = pd.DataFrame({
'State': ['California', 'Texas', 'New York', 'California', 'Texas'],
'Sales': [200, 150, 250, 300, 100],
'Product': ['Product A', 'Product B', 'Product A', 'Product B', 'Product A']
})
grouped = df.groupby('State')
for state, group in grouped:
print(f"State: {state}")
print(group)
Output:
State: California
State Sales Product
0 California 200 Product A
3 California 300 Product B
State: New York
State Sales Product
2 New York 250 Product A
State: Texas
State Sales Product
1 Texas 150 Product B
4 Texas 100 Product A
Following the execution of the code within the Pycharm editor, a screenshot of the outcome is displayed below.
2. groupby without aggregation Pandas with applying different functions to groups
We can apply different functions to each group without aggregating them in Python. This is useful when the function does not reduce the group to a single value
Here is the code, where Pandas groupby without aggregation function in Python is used:
import pandas as pd
df = pd.DataFrame({
'Region': ['East', 'West', 'East', 'West', 'East'],
'Price': [20, 30, 25, 35, 22],
'Product': ['Item 1', 'Item 2', 'Item 3', 'Item 4', 'Item 5']
})
grouped = df.groupby('Region')
def adjust_price(group):
if group.name == 'East':
group['Price'] *= 1.1 # 10% increase for East region
else:
group['Price'] *= 0.9 # 10% decrease for West region
return group
adjusted_prices = grouped.apply(adjust_price)
print(adjusted_prices)
Output:
Region Price Product
Region
East 0 East 22.0 Item 1
2 East 27.5 Item 3
4 East 24.2 Item 5
West 1 West 27.0 Item 2
3 West 31.5 Item 4
Following the execution of the code in PyCharm, the resulting output is captured in the screenshot displayed below.
3. groupby pandas without aggregation for sorting and filtering within groups
We can perform sorting and filtering operations within each group for Pandas groupby without aggregation function in Python. This allows for more fine-grained data analysis.
import pandas as pd
df = pd.DataFrame({
'State': ['California', 'Texas', 'California', 'New York', 'Texas'],
'University': ['UCLA', 'UT Austin', 'UC Berkeley', 'NYU', 'Texas A&M'],
'Enrollment': [45000, 50000, 42000, 30000, 55000]
})
grouped = df.groupby('State')
sorted_universities = grouped.apply(lambda x: x.sort_values('Enrollment', ascending=False))
print(sorted_universities)
filtered_states = grouped.filter(lambda x: x['Enrollment'].max() > 40000)
print(filtered_states)
Output:
State University Enrollment
State
California 0 California UCLA 45000
2 California UC Berkeley 42000
New York 3 New York NYU 30000
Texas 4 Texas Texas A&M 55000
1 Texas UT Austin 50000
State University Enrollment
0 California UCLA 45000
1 Texas UT Austin 50000
2 California UC Berkeley 42000
4 Texas Texas A&M 55000
Below is a screenshot depicting the output, captured after the code was run in the PyCharm editor.
4. Pandas groupby with multiple columns without aggregation
When we use the groupby method with multiple columns, Pandas groups the data based on unique combinations of values in these columns. This creates a multi-level index in the resulting grouped object, allowing for a more granular analysis.
Skipping aggregation means we’re not reducing the data to a summary statistic but instead are interested in the grouped data itself. This approach is useful for detailed data exploration and for applying specific operations within each unique group.
import pandas as pd
df = pd.DataFrame({
'State': ['California', 'California', 'Texas', 'Texas', 'New York', 'New York'],
'City': ['Los Angeles', 'San Francisco', 'Houston', 'Austin', 'New York City', 'Buffalo'],
'Clinics': [30, 20, 25, 10, 40, 15],
'Hospitals': [10, 5, 7, 3, 12, 4]
})
grouped = df.groupby(['State', 'City'])
for (state, city), group in grouped:
print(f"State: {state}, City: {city}")
print(group)
Output:
State: California, City: Los Angeles
State City Clinics Hospitals
0 California Los Angeles 30 10
State: California, City: San Francisco
State City Clinics Hospitals
1 California San Francisco 20 5
State: New York, City: Buffalo
State City Clinics Hospitals
5 New York Buffalo 15 4
State: New York, City: New York City
State City Clinics Hospitals
4 New York New York City 40 12
State: Texas, City: Austin
State City Clinics Hospitals
3 Texas Austin 10 3
State: Texas, City: Houston
State City Clinics Hospitals
2 Texas Houston 25 7
The screenshot provided below showcases the output generated after running the code in the PyCharm editor.
Conclusion
Here, I have explained the versatility of Pandas groupby without aggregation in Python through four examples: exploring data, applying different functions to groups, sorting and filtering within groups, and using groupby with multiple columns. These examples demonstrate how groupby enables detailed, nuanced analysis of datasets, providing insights without necessarily condensing data into aggregate statistics.
You may also like to read:
- How to check if a dataframe is empty in Python
- How to read a CSV to the dictionary using Pandas in Python
- How to impute missing values in Pandas Python
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.