Pandas groupby without aggregation function in Python [4 Examples]

In this Pandas tutorial, I will explain the Pandas groupby without aggregation function in Python, with some illustrative examples.

To understand Pandas groupby without aggregation in Python, it’s about dividing a dataset into groups based on some criteria, without reducing these groups to a single statistical summary. This technique allows for detailed analysis and manipulation of each subgroup, enabling a more nuanced exploration of the dataset’s structure and characteristics.

Pandas groupby with aggregation function in Python

The groupby method in Pandas, combined with an aggregate function, is a powerful tool for data analysis in Python. It enables us to group data by certain columns and then perform various aggregate operations on the grouped data. Here’s how it works:

import pandas as pd

# Sample DataFrame
data = {
    'State': ['California', 'Texas', 'New York', 'California', 'Texas', 'New York', 'California'],
    'Product': ['Product A', 'Product B', 'Product A', 'Product C', 'Product B', 'Product C', 'Product A'],
    'Sales': [200, 150, 400, 300, 250, 350, 500]
}

df = pd.DataFrame(data)
# Group by 'State' and 'Product', then sum 'Sales'
grouped = df.groupby(['State', 'Product']).agg('sum')
print(grouped)

Output:

                      Sales
State      Product         
California Product A    700
           Product C    300
New York   Product A    400
           Product C    350
Texas      Product B    400

Below is a screenshot showcasing the result, following the implementation of the code in the Pycharm editor.

Pandas groupby with aggregation function in Python

Pandas groupby without aggregation function in Python

Using Pandas groupby without aggregation function in Python allows us to segment our dataset into groups. Still, it doesn’t collapse these groups into a single value per group (which is what aggregation does).

READ:  Matplotlib Plot NumPy Array

Let’s see some of the use cases and examples for this:

1. group by without aggregate Pandas for exploring data

Pandas groupby without aggregation function in Python is a great way to explore and understand our data. For instance, we can group data by a specific category and then examine each group individually:

import pandas as pd

df = pd.DataFrame({
    'State': ['California', 'Texas', 'New York', 'California', 'Texas'],
    'Sales': [200, 150, 250, 300, 100],
    'Product': ['Product A', 'Product B', 'Product A', 'Product B', 'Product A']
})
grouped = df.groupby('State')
for state, group in grouped:
    print(f"State: {state}")
    print(group)

Output:

State: California
        State  Sales    Product
0  California    200  Product A
3  California    300  Product B
State: New York
      State  Sales    Product
2  New York    250  Product A
State: Texas
   State  Sales    Product
1  Texas    150  Product B
4  Texas    100  Product A

Following the execution of the code within the Pycharm editor, a screenshot of the outcome is displayed below.

Pandas groupby without aggregation function in Python

2. groupby without aggregation Pandas with applying different functions to groups

We can apply different functions to each group without aggregating them in Python. This is useful when the function does not reduce the group to a single value

Here is the code, where Pandas groupby without aggregation function in Python is used:

import pandas as pd

df = pd.DataFrame({
    'Region': ['East', 'West', 'East', 'West', 'East'],
    'Price': [20, 30, 25, 35, 22],
    'Product': ['Item 1', 'Item 2', 'Item 3', 'Item 4', 'Item 5']
})

grouped = df.groupby('Region')

def adjust_price(group):
    if group.name == 'East':
        group['Price'] *= 1.1  # 10% increase for East region
    else:
        group['Price'] *= 0.9  # 10% decrease for West region
    return group

adjusted_prices = grouped.apply(adjust_price)
print(adjusted_prices)

Output:

         Region  Price Product
Region                        
East   0   East   22.0  Item 1
       2   East   27.5  Item 3
       4   East   24.2  Item 5
West   1   West   27.0  Item 2
       3   West   31.5  Item 4

Following the execution of the code in PyCharm, the resulting output is captured in the screenshot displayed below.

pandas groupby without aggregation in Python

3. groupby pandas without aggregation for sorting and filtering within groups

We can perform sorting and filtering operations within each group for Pandas groupby without aggregation function in Python. This allows for more fine-grained data analysis.

import pandas as pd

df = pd.DataFrame({
    'State': ['California', 'Texas', 'California', 'New York', 'Texas'],
    'University': ['UCLA', 'UT Austin', 'UC Berkeley', 'NYU', 'Texas A&M'],
    'Enrollment': [45000, 50000, 42000, 30000, 55000]
})

grouped = df.groupby('State')

sorted_universities = grouped.apply(lambda x: x.sort_values('Enrollment', ascending=False))
print(sorted_universities)

filtered_states = grouped.filter(lambda x: x['Enrollment'].max() > 40000)
print(filtered_states)

Output:

                   State   University  Enrollment
State                                            
California 0  California         UCLA       45000
           2  California  UC Berkeley       42000
New York   3    New York          NYU       30000
Texas      4       Texas    Texas A&M       55000
           1       Texas    UT Austin       50000
        State   University  Enrollment
0  California         UCLA       45000
1       Texas    UT Austin       50000
2  California  UC Berkeley       42000
4       Texas    Texas A&M       55000

Below is a screenshot depicting the output, captured after the code was run in the PyCharm editor.

pandas groupby no aggregation in Python

4. Pandas groupby with multiple columns without aggregation

When we use the groupby method with multiple columns, Pandas groups the data based on unique combinations of values in these columns. This creates a multi-level index in the resulting grouped object, allowing for a more granular analysis.

READ:  Matplotlib best fit line

Skipping aggregation means we’re not reducing the data to a summary statistic but instead are interested in the grouped data itself. This approach is useful for detailed data exploration and for applying specific operations within each unique group.

import pandas as pd

df = pd.DataFrame({
    'State': ['California', 'California', 'Texas', 'Texas', 'New York', 'New York'],
    'City': ['Los Angeles', 'San Francisco', 'Houston', 'Austin', 'New York City', 'Buffalo'],
    'Clinics': [30, 20, 25, 10, 40, 15],
    'Hospitals': [10, 5, 7, 3, 12, 4]
})

grouped = df.groupby(['State', 'City'])

for (state, city), group in grouped:
    print(f"State: {state}, City: {city}")
    print(group)

Output:

State: California, City: Los Angeles
        State         City  Clinics  Hospitals
0  California  Los Angeles       30         10
State: California, City: San Francisco
        State           City  Clinics  Hospitals
1  California  San Francisco       20          5
State: New York, City: Buffalo
      State     City  Clinics  Hospitals
5  New York  Buffalo       15          4
State: New York, City: New York City
      State           City  Clinics  Hospitals
4  New York  New York City       40         12
State: Texas, City: Austin
   State    City  Clinics  Hospitals
3  Texas  Austin       10          3
State: Texas, City: Houston
   State     City  Clinics  Hospitals
2  Texas  Houston       25          7

The screenshot provided below showcases the output generated after running the code in the PyCharm editor.

pandas groupby without aggregate in Python

Conclusion

Here, I have explained the versatility of Pandas groupby without aggregation in Python through four examples: exploring data, applying different functions to groups, sorting and filtering within groups, and using groupby with multiple columns. These examples demonstrate how groupby enables detailed, nuanced analysis of datasets, providing insights without necessarily condensing data into aggregate statistics.

You may also like to read: