When I was working on a data analysis project for a U.S. retail chain where I needed to examine the relationship between customer demographics and purchasing patterns. I needed to create a cross-tabulation of these variables to identify trends. That’s when the pandas crosstab function became my go-to solution.
In this article, I’ll explain what the pandas crosstab function is, how it works, and show you practical examples of how to use it effectively in your data analysis projects.
So, let’s get in..
Pandas Crosstab
The pandas crosstab function (pd.crosstab) is a useful tool that computes a cross-tabulation of two or more factors. Think of it as creating a frequency table that shows the relationship between different categorical variables.
This function is particularly useful when you want to understand the distribution of data across multiple dimensions or categories.
Basic Syntax of pd.crosstab
Let’s first look at the basic syntax of the pd.crosstab function in Python:
pd.crosstab(index, columns, values=None, rownames=None, colnames=None,
aggfunc=None, margins=False, margins_name='All',
dropna=True, normalize=False)The two main parameters you’ll always use are:
index: The categorical variable(s) to use for the table rowscolumns: The categorical variable(s) to use for the table columns
1 – Create a Simple Crosstab
Let’s start with a simple example using a dataset of customer information from a U.S. retail chain.
import pandas as pd
import numpy as np
# Create sample data
np.random.seed(42)
data = {
'Gender': np.random.choice(['Male', 'Female'], size=100),
'State': np.random.choice(['California', 'Texas', 'New York', 'Florida'], size=100),
'Purchase': np.random.choice(['Yes', 'No'], size=100)
}
df = pd.DataFrame(data)
# Create a simple crosstab
result = pd.crosstab(df['Gender'], df['State'])
print(result)Output:
State California Florida New York Texas
Gender
Female 11 13 20 12
Male 15 11 10 8You can refer to the screenshot below to see the output.

This gives us a quick overview of how many male and female customers we have from each state.
2 – Add Margins to Your Crosstab
Sometimes, you want to see the row and column totals. This is where the margins parameter comes in handy:
# Crosstab with margins
result_with_margins = pd.crosstab(df['Gender'], df['State'], margins=True, margins_name='Total')
print(result_with_margins)Output:
State California Florida New York Texas Total
Gender
Female 11 13 20 12 56
Male 15 11 10 8 44
Total 26 24 30 20 100You can refer to the screenshot below to see the output.

This gives us the total counts for each gender and state, as well as the overall total.
Read Python Pandas Write to Excel
3 – Normalize Your Crosstab
To convert the raw counts into percentages, you can use the normalize parameter in Python:
# Normalize to get percentages
# Normalize by row
row_normalized = pd.crosstab(df['Gender'], df['State'], normalize='index')
print("Row Normalized:")
print(row_normalized)
# Normalize by column
col_normalized = pd.crosstab(df['Gender'], df['State'], normalize='columns')
print("\nColumn Normalized:")
print(col_normalized)
# Normalize by total
total_normalized = pd.crosstab(df['Gender'], df['State'], normalize=True)
print("\nTotal Normalized:")
print(total_normalized)Output:
Row Normalized:
State California Florida New York Texas
Gender
Female 0.196429 0.232143 0.357143 0.214286
Male 0.340909 0.250000 0.227273 0.181818
Column Normalized:
State California Florida New York Texas
Gender
Female 0.423077 0.541667 0.666667 0.6
Male 0.576923 0.458333 0.333333 0.4
Total Normalized:
State California Florida New York Texas
Gender
Female 0.11 0.13 0.2 0.12
Male 0.15 0.11 0.1 0.08You can refer to the screenshot below to see the output.

These normalized tables show proportions instead of raw counts, which can be more insightful for comparison.
4 – Use Crosstab with a Values Column
We can also use the crosstab function to aggregate values based on the categorical variables:
# Add a 'PurchaseAmount' column
df['PurchaseAmount'] = np.random.randint(10, 500, size=100)
# Use crosstab with values and an aggregation function
result_with_values = pd.crosstab(df['Gender'], df['State'],
values=df['PurchaseAmount'],
aggfunc='mean',
margins=True,
margins_name='Average')
print(result_with_values)Output:
State California Florida New York Texas Average
Gender
Female 256.714286 242.444444 237.250000 254.307692 248.395833
Male 268.142857 246.454545 248.076923 247.571429 252.980769
Average 262.428571 244.650000 242.840000 250.814815 250.770000This shows the average purchase amount for each gender across different states, providing more meaningful insights.
Check out np.where in Pandas Python
5 – Create a Multi-Level Crosstab
For more complex analyses, we can create a multi-level crosstab:
# Add an 'Age' column
df['AgeGroup'] = np.random.choice(['18-25', '26-35', '36-50', '50+'], size=100)
# Create a multi-level crosstab
multi_level = pd.crosstab([df['Gender'], df['AgeGroup']], df['State'])
print(multi_level)Output:
State California Florida New York Texas
Gender AgeGroup
Female 18-25 4 2 2 3
26-35 2 2 5 5
36-50 5 3 3 2
50+ 3 2 2 3
Male 18-25 5 2 4 4
26-35 3 5 3 2
36-50 4 2 4 5
50+ 2 2 2 3This gives us a more detailed breakdown of our customer data, showing the distribution across gender, age groups, and states.
Read Pandas Find Duplicates in Python
6 – Visualize Crosstab Results
One of the most useful ways to use Crosstab is to visualize the results:
import matplotlib.pyplot as plt
import seaborn as sns
# Create a heatmap from a crosstab
plt.figure(figsize=(10, 6))
crosstab_result = pd.crosstab(df['Gender'], df['State'])
sns.heatmap(crosstab_result, annot=True, cmap='YlGnBu', fmt='d')
plt.title('Customer Distribution by Gender and State')
plt.tight_layout()
plt.show()This creates a visual heatmap that makes it much easier to spot patterns in your data.
Practical Applications of pd.crosstab
The pandas crosstab function is incredibly versatile and can be used in various real-world scenarios:
- Market Analysis: Analyzing customer demographics and purchasing behavior.
- Healthcare: Examining patient outcomes across different treatments and demographics.
- Finance: Analyzing transaction patterns across different customer segments.
- Education: Studying student performance across different subjects and demographics.
- Politics: Analyzing voting patterns across different demographics and regions.
In my experience, the crosstab function is particularly useful when you need to quickly understand the relationship between categorical variables in your dataset.
The methods that I explained in this guide are: creating a simple crosstab, adding margins to your crosstab, normalizing your crosstab, using crosstab with a values column, creating a multi-level crosstab, and visualizing crosstab results. I also discussed some practical applications of pd.crosstab.
I hope this article has helped you understand what the pandas crosstab function is and how you can use it in your data analysis projects.
Other Python articles you may also like:
- Pandas str.replace Multiple Values in Python
- Convert Python Dictionary to Pandas DataFrame
- Add Rows to a DataFrame Pandas in a Loop in Python

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.