Pandas Groupby Without Aggregation Function In Python

Recently, while working on a data analysis project, I needed to group data without performing any aggregation. The issue is, many Python developers only use groupby with aggregation functions like sum(), mean(), or count(). But sometimes you just want to group the data without summarizing it.

In this article, I’ll cover several simple ways to use pandas groupby without aggregation in Python. I’ll share practical examples that you can apply to your data analysis projects.

So let’s get in..

Table of Contents

Pandas Groupby Without Aggregation

When we use groupby in pandas, we typically follow it with an aggregation function to summarize the data. However, there are cases where we want to group the data but keep all the original rows and information.

This approach is useful when you want to:

Perform operations on groups separately
Apply different functions to different groups
Preserve the original data structure
Create groups for visualization purposes

Read Pandas Merge Fill NAN with 0 in Python

Pandas Groupby Without Aggregation Function in Python

Now, I will explain to you the methods to work with Pandas groupby without an aggregation function in Python.

Method 1: Use the groupby Object Directly

The simplest way to use groupby without aggregation in Python is to iterate through the groupby object directly:

import pandas as pd

# Sample data of sales in different US states
data = {
    'State': ['California', 'California', 'New York', 'Texas', 'Texas', 'Texas'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone', 'Tablet'],
    'Sales': [1200, 800, 600, 1000, 700, 500]
}

df = pd.DataFrame(data)

# Group by State without aggregation
grouped = df.groupby('State')

# Iterate through each group
for state, group_data in grouped:
    print(f"Group: {state}")
    print(group_data)
    print()

Output:

Group: California
        State Product  Sales
0  California  Laptop   1200
1  California   Phone    800

Group: New York
      State Product  Sales
2  New York  Tablet    600

Group: Texas
   State Product  Sales
3  Texas  Laptop   1000
4  Texas   Phone    700
5  Texas  Tablet    500

I executed the above example code and added the screenshot below.

This will display each group separately, maintaining all the original rows and columns within each group.

Check out Pandas Find Duplicates in Python

Method 2: Use groupby.get_group()

If you’re only interested in accessing specific groups, you can use the get_group() method in Python:

import pandas as pd

# Using the same sales data
df = pd.DataFrame(data)

# Group by State
grouped = df.groupby('State')

# Get a specific group
california_data = grouped.get_group('California')
print("California Sales Data:")
print(california_data)

Output:

California Sales Data:
        State Product  Sales
0  California  Laptop   1200
1  California   Phone    800

I executed the above example code and added the screenshot below.

This method is particularly useful when you need to retrieve and work with specific groups individually.

Method 3: Use apply() with a Custom Function

Python apply() method allows you to use a custom function on each group without necessarily aggregating the data:

import pandas as pd

# Sample data of customer purchases
data = {
    'Customer': ['John', 'John', 'Sarah', 'Sarah', 'Mike', 'Mike'],
    'Date': ['2023-01-15', '2023-02-20', '2023-01-10', '2023-03-05', '2023-02-10', '2023-04-15'],
    'Amount': [120, 85, 200, 150, 300, 250]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])

# Define a function that adds a column showing days since first purchase
def add_days_since_first(group):
    group = group.copy()
    group['FirstPurchaseDate'] = group['Date'].min()
    group['DaysSinceFirst'] = (group['Date'] - group['FirstPurchaseDate']).dt.days
    return group

# Apply the function to each customer group
result = df.groupby('Customer').apply(add_days_since_first)
print(result)

Output:

           Customer       Date  Amount FirstPurchaseDate  DaysSinceFirst
Customer
John     0     John 2023-01-15     120        2023-01-15               0
         1     John 2023-02-20      85        2023-01-15              36
Mike     4     Mike 2023-02-10     300        2023-02-10               0
         5     Mike 2023-04-15     250        2023-02-10              64
Sarah    2    Sarah 2023-01-10     200        2023-01-10               0
         3    Sarah 2023-03-05     150        2023-01-10              54

I executed the above example code and added the screenshot below.

pandas groupby multiple columns without aggregate

This approach lets you transform the data within each group while preserving all rows.

Read Pandas str.replace Multiple Values in Python

Method 4: Use GroupBy.transform()

Python transform() method applies a function to each group independently and returns a Series or DataFrame with the same shape as the original:

import pandas as pd

# Sample data of employee salaries in different departments
data = {
    'Department': ['Sales', 'Sales', 'Sales', 'Marketing', 'Marketing', 'IT', 'IT', 'IT'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Heidi'],
    'Salary': [60000, 65000, 70000, 68000, 72000, 80000, 85000, 90000]
}

df = pd.DataFrame(data)

# Add a column showing each salary as a percentage of department average
df['Dept_Avg'] = df.groupby('Department')['Salary'].transform('mean')
df['Salary_Pct_of_Avg'] = (df['Salary'] / df['Dept_Avg'] * 100).round(2)

print(df)

The transform() method is perfect when you need to add computed group-level information to your original dataframe.

Check out Convert Python Dictionary to Pandas DataFrame

Method 5: Use groupby with filter()

Python filter() method allows you to select groups based on some condition without aggregating them:

import pandas as pd

# Sample data of store sales
data = {
    'Store': ['NY001', 'NY001', 'NY002', 'LA001', 'LA001', 'LA001', 'CH001', 'CH001'],
    'Date': ['2023-01-15', '2023-01-16', '2023-01-15', '2023-01-15', '2023-01-16', '2023-01-17', '2023-01-15', '2023-01-16'],
    'Sales': [1200, 1300, 800, 1500, 1600, 1700, 900, 950]
}

df = pd.DataFrame(data)

# Filter stores that have more than 2 days of data
filtered_df = df.groupby('Store').filter(lambda x: len(x) > 2)
print("Stores with more than 2 days of data:")
print(filtered_df)

This is useful when you want to include or exclude entire groups based on group-level conditions.

Work with MultiIndex after GroupBy

When using groupby without aggregation, you often end up with a MultiIndex. Here’s how to handle it:

import pandas as pd

# Sample data of product sales across regions
data = {
    'Region': ['East', 'East', 'West', 'West', 'South', 'South'],
    'Category': ['Electronics', 'Furniture', 'Electronics', 'Furniture', 'Electronics', 'Furniture'],
    'Sales': [10000, 8000, 15000, 12000, 9000, 7500],
    'Profit': [2000, 1500, 3000, 2500, 1800, 1200]
}

df = pd.DataFrame(data)

# Group by Region and Category without aggregation
grouped = df.groupby(['Region', 'Category'])

# Convert to DataFrame with MultiIndex
result = pd.DataFrame([(name, group) for name, group in grouped])
print(result)

# Reset the index to convert MultiIndex to regular columns
flattened = df.groupby(['Region', 'Category']).apply(lambda x: x).reset_index(drop=True)
print("\nFlattened result:")
print(flattened)

Understanding how to manage the resulting MultiIndex is key to working effectively with ungrouped data.

Read Add Rows to a DataFrame Pandas in a Loop in Python

Practical Example: Analyze US Census Data

Let’s look at a more practical example using simulated US census data:

import pandas as pd
import numpy as np

# Create a simulated US census dataset
np.random.seed(42)
states = ['California', 'Texas', 'New York', 'Florida', 'Illinois']
cities_by_state = {
    'California': ['Los Angeles', 'San Francisco', 'San Diego'],
    'Texas': ['Houston', 'Dallas', 'Austin'],
    'New York': ['New York City', 'Buffalo', 'Rochester'],
    'Florida': ['Miami', 'Orlando', 'Tampa'],
    'Illinois': ['Chicago', 'Springfield', 'Peoria']
}

data = []
for state in states:
    for city in cities_by_state[state]:
        # Generate multiple records per city
        for _ in range(5):
            data.append({
                'State': state,
                'City': city,
                'Age': np.random.randint(18, 85),
                'Income': np.random.randint(25000, 150000),
                'Education': np.random.choice(['High School', 'Bachelor', 'Master', 'PhD']),
                'Homeowner': np.random.choice([True, False])
            })

census_df = pd.DataFrame(data)

# Group by State and City without aggregation
grouped = census_df.groupby(['State', 'City'])

# Calculate statistics for each group without losing original data
census_df['State_Avg_Income'] = census_df.groupby('State')['Income'].transform('mean')
census_df['City_Avg_Income'] = census_df.groupby(['State', 'City'])['Income'].transform('mean')
census_df['Income_vs_State_Avg'] = census_df['Income'] - census_df['State_Avg_Income']

# Find people with income above their city average
high_earners = census_df[census_df['Income'] > census_df['City_Avg_Income']]
print("High earners by city (sample):")
print(high_earners[['State', 'City', 'Income', 'City_Avg_Income']].head(10))

# Create a feature showing education rank within each state
edu_order = {'High School': 1, 'Bachelor': 2, 'Master': 3, 'PhD': 4}
census_df['Education_Rank'] = census_df['Education'].map(edu_order)

# Group homeownership percentages by education level and state
def calc_homeowner_pct(group):
    group = group.copy()
    group['Homeowner_Pct'] = group['Homeowner'].mean() * 100
    return group

ownership_by_edu = census_df.groupby(['State', 'Education']).apply(calc_homeowner_pct)
print("\nHomeownership percentages by state and education (sample):")
print(ownership_by_edu[['Homeowner_Pct']].head(10))

This example demonstrates how groupby without aggregation can be powerful for enriching your dataset with group-level statistics while preserving the original granularity.

I hope you found this article helpful. Using pandas groupby without aggregation functions allows for more flexible data manipulation while preserving your original data structure. This method is particularly valuable when you need to enrich your data with group-level statistics or when you want to process groups separately.

Pandas Groupby Without Aggregation Function in Python

Pandas Groupby Without Aggregation

Pandas Groupby Without Aggregation Function in Python

Method 1: Use the groupby Object Directly

Method 2: Use groupby.get_group()

Method 3: Use apply() with a Custom Function

Method 4: Use GroupBy.transform()

Method 5: Use groupby with filter()

Work with MultiIndex after GroupBy

Practical Example: Analyze US Census Data

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends