When analyzing data in Python, I often need to see relationships between categorical variables. Creating cross-tabulations with percentage values has been one of my go-to techniques for years.
While I was working on a project analyzing voter demographics, I needed to see the percentage breakdown across different categories. The solution? Pandas crosstab with normalization.
In this tutorial, I will walk you through several methods to create percentage-based crosstabs in Pandas.
Crosstab in Pandas
A crosstab (cross-tabulation) is a statistical table that shows the frequency distribution of variables. Think of it as a spreadsheet where:
- Rows represent one categorical variable
- Columns represent another categorical variable
- Cell values show the relationship between them
By default, these cells display counts, but we can convert them to percentages with normalization.
Create Pandas Crosstab Percentage in Python
Let me show you the methods to create a pandas crosstab percentage in Python.
Method 1: Use pd.crosstab with the normalize Parameter
The simplest way to create a percentage-based crosstab is by using the normalize parameter in Python Pandas’ pd.crosstab() function.
Let’s look at a practical example with a dataset containing information about customer purchases:
import pandas as pd
import numpy as np
# Sample data: Customer demographics and purchase information
np.random.seed(42)
data = {
'age_group': np.random.choice(['18-25', '26-35', '36-45', '46+'], size=100),
'gender': np.random.choice(['Male', 'Female'], size=100),
'product_category': np.random.choice(['Electronics', 'Clothing', 'Home', 'Books'], size=100)
}
df = pd.DataFrame(data)
# Create a basic crosstab (counts)
basic_crosstab = pd.crosstab(df['age_group'], df['product_category'])
print("Basic Crosstab (counts):")
print(basic_crosstab)Now, let’s convert this to percentages by adding the normalize parameter:
# Create percentage crosstab (normalized by row)
row_pct = pd.crosstab(df['age_group'], df['product_category'], normalize='index')
print("\nRow Percentages (% within each age group):")
print(row_pct)I executed the above example code and added the screenshot below.

The output will show what percentage of each age group purchased each product category.
Different Normalization Options
The normalize parameter in pd.crosstab() accepts different values based on how you want to calculate percentages:
1. Row Percentages (normalize='index')
This shows the percentage distribution across each row:
# Row percentages
row_pct = pd.crosstab(df['age_group'], df['product_category'], normalize='index')
print(row_pct.round(2) * 100) # Multiply by 100 for percentage format2. Column Percentages (normalize='columns')
This shows the percentage distribution down each column:
# Column percentages
col_pct = pd.crosstab(df['age_group'], df['product_category'], normalize='columns')
print(col_pct.round(2) * 100) # Multiply by 100 for percentage format3. Total Percentages (normalize=True or normalize='all')
This shows each cell as a percentage of the entire table:
# Total percentages
total_pct = pd.crosstab(df['age_group'], df['product_category'], normalize=True)
print(total_pct.round(2) * 100) # Multiply by 100 for percentage formatMethod 2: Multi-level Crosstab with Percentages
Sometimes we need to analyze data across more than two dimensions. Python Pandas allows us to create multi-level crosstabs with percentages:
# Multi-level crosstab with percentages
multi_pct = pd.crosstab([df['age_group'], df['gender']],
df['product_category'],
normalize='index')
print(multi_pct.round(2) * 100) # Multiply by 100 for percentage formatOutput:
product_category Books Clothing Electronics Home
age_group gender
18-25 Female 18.0 27.0 36.0 18.0
Male 56.0 11.0 11.0 22.0
26-35 Female 36.0 29.0 14.0 21.0
Male 25.0 33.0 42.0 0.0
36-45 Female 12.0 25.0 25.0 38.0
Male 25.0 19.0 25.0 31.0
46+ Female 27.0 0.0 45.0 27.0
Male 21.0 26.0 26.0 26.0I executed the above example code and added the screenshot below.

This creates a table showing what percentage of each age/gender combination purchased each product category.
Read Python Pandas Write to Excel
Method 3: Add Margins with Percentages
Adding margins to your crosstab can provide row and column totals, which is helpful for a complete analysis:
# Crosstab with margins and percentages
margins_pct = pd.crosstab(df['age_group'],
df['product_category'],
normalize='index',
margins=True,
margins_name='Total')
print(margins_pct.round(2) * 100) # Multiply by 100 for percentage formatOutput:
product_category Books Clothing Electronics Home
age_group
18-25 35.0 20.0 25.0 20.0
26-35 31.0 31.0 27.0 12.0
36-45 21.0 21.0 25.0 33.0
46+ 23.0 17.0 33.0 27.0
Total 27.0 22.0 28.0 23.0I executed the above example code and added the screenshot below.

The margins=True parameter adds a “Total” row and column, giving you a comprehensive view of your data.
Check out Pandas Find Duplicates in Python
Format the Percentage Output
For better readability, you might want to format the percentages:
# Format as percentages with one decimal place
def format_pct(val):
return f'{val:.1f}%'
styled_pct = row_pct.round(3) * 100
styled_pct = styled_pct.applymap(format_pct)
print(styled_pct)Real-World Example: Analyze Voting Patterns
Let’s look at a more realistic example analyzing voting patterns in the 2020 U.S. presidential election:
# Sample voting data
voter_data = {
'state': np.random.choice(['California', 'Texas', 'Florida', 'New York'], size=1000),
'age_group': np.random.choice(['18-29', '30-44', '45-64', '65+'], size=1000),
'education': np.random.choice(['High School', 'Some College', 'College Grad', 'Post-Grad'], size=1000),
'candidate': np.random.choice(['Biden', 'Trump'], size=1000, p=[0.52, 0.48]) # Roughly matching 2020 results
}
voter_df = pd.DataFrame(voter_data)
# Analyze voting patterns by state and education
state_edu_vote = pd.crosstab([voter_df['state'], voter_df['education']],
voter_df['candidate'],
normalize='index')
print("Percentage of votes by state and education level:")
print((state_edu_vote * 100).round(1))This analysis helps us understand how education levels within each state correlate with voting preferences.
Common Issues and Solutions
Let me show you some common issues that are faced while creating a pandas crosstab percentage in Python and solutions to them.
Issue 1: Deal with NaN Values
If your crosstab contains NaN values, you can handle them like this:
# Fill NaN values with zeros
clean_pct = row_pct.fillna(0)Issue 2: Round Errors
Sometimes percentage columns don’t sum exactly to 100% due to rounding:
# Force columns to sum to 100%
def normalize_to_100(series):
return series / series.sum() * 100
adjusted_pct = row_pct.apply(normalize_to_100, axis=0)I hope you found this guide helpful for creating percentage-based crosstabs in Pandas.
In this guide, I have explained how to create a pandas crosstab percentage in Python. The methods that I discussed are: using pd.crosstab with the normalize parameter, multiple-level crosstab with percentages, and adding margins with percentages.
I also covered different normalization options, real-world examples of analyzing voting patterns, and some common issues and solutions.
Other Python articles you may also like:
- Set the First Column as Index in Pandas Python
- Add Column from Another Dataframe in Pandas
- Pandas Get Index of Row in Python

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.