How To Create Pandas Crosstab Percentage In Python

When analyzing data in Python, I often need to see relationships between categorical variables. Creating cross-tabulations with percentage values has been one of my go-to techniques for years.

While I was working on a project analyzing voter demographics, I needed to see the percentage breakdown across different categories. The solution? Pandas crosstab with normalization.

In this tutorial, I will walk you through several methods to create percentage-based crosstabs in Pandas.

This Tutorial Covers:

Crosstab in Pandas

A crosstab (cross-tabulation) is a statistical table that shows the frequency distribution of variables. Think of it as a spreadsheet where:

Rows represent one categorical variable
Columns represent another categorical variable
Cell values show the relationship between them

By default, these cells display counts, but we can convert them to percentages with normalization.

Create Pandas Crosstab Percentage in Python

Let me show you the methods to create a pandas crosstab percentage in Python.

Method 1: Use pd.crosstab with the normalize Parameter

The simplest way to create a percentage-based crosstab is by using the normalize parameter in Python Pandas’ pd.crosstab() function.

Let’s look at a practical example with a dataset containing information about customer purchases:

import pandas as pd
import numpy as np

# Sample data: Customer demographics and purchase information
np.random.seed(42)
data = {
    'age_group': np.random.choice(['18-25', '26-35', '36-45', '46+'], size=100),
    'gender': np.random.choice(['Male', 'Female'], size=100),
    'product_category': np.random.choice(['Electronics', 'Clothing', 'Home', 'Books'], size=100)
}

df = pd.DataFrame(data)

# Create a basic crosstab (counts)
basic_crosstab = pd.crosstab(df['age_group'], df['product_category'])
print("Basic Crosstab (counts):")
print(basic_crosstab)

Now, let’s convert this to percentages by adding the normalize parameter:

# Create percentage crosstab (normalized by row)
row_pct = pd.crosstab(df['age_group'], df['product_category'], normalize='index')
print("\nRow Percentages (% within each age group):")
print(row_pct)

I executed the above example code and added the screenshot below.

The output will show what percentage of each age group purchased each product category.

Different Normalization Options

The normalize parameter in pd.crosstab() accepts different values based on how you want to calculate percentages:

1. Row Percentages (normalize='index')

This shows the percentage distribution across each row:

# Row percentages
row_pct = pd.crosstab(df['age_group'], df['product_category'], normalize='index')
print(row_pct.round(2) * 100)  # Multiply by 100 for percentage format

2. Column Percentages (normalize='columns')

This shows the percentage distribution down each column:

# Column percentages
col_pct = pd.crosstab(df['age_group'], df['product_category'], normalize='columns')
print(col_pct.round(2) * 100)  # Multiply by 100 for percentage format

3. Total Percentages (normalize=True or normalize='all')

This shows each cell as a percentage of the entire table:

# Total percentages
total_pct = pd.crosstab(df['age_group'], df['product_category'], normalize=True)
print(total_pct.round(2) * 100)  # Multiply by 100 for percentage format

Method 2: Multi-level Crosstab with Percentages

Sometimes we need to analyze data across more than two dimensions. Python Pandas allows us to create multi-level crosstabs with percentages:

# Multi-level crosstab with percentages
multi_pct = pd.crosstab([df['age_group'], df['gender']], 
                         df['product_category'], 
                         normalize='index')

print(multi_pct.round(2) * 100)  # Multiply by 100 for percentage format

Output:

product_category  Books  Clothing  Electronics  Home
age_group gender
18-25     Female   18.0      27.0         36.0  18.0
          Male     56.0      11.0         11.0  22.0
26-35     Female   36.0      29.0         14.0  21.0
          Male     25.0      33.0         42.0   0.0
36-45     Female   12.0      25.0         25.0  38.0
          Male     25.0      19.0         25.0  31.0
46+       Female   27.0       0.0         45.0  27.0
          Male     21.0      26.0         26.0  26.0

I executed the above example code and added the screenshot below.

This creates a table showing what percentage of each age/gender combination purchased each product category.

Read Python Pandas Write to Excel

Method 3: Add Margins with Percentages

Adding margins to your crosstab can provide row and column totals, which is helpful for a complete analysis:

# Crosstab with margins and percentages
margins_pct = pd.crosstab(df['age_group'], 
                          df['product_category'], 
                          normalize='index', 
                          margins=True, 
                          margins_name='Total')

print(margins_pct.round(2) * 100)  # Multiply by 100 for percentage format

Output:

product_category  Books  Clothing  Electronics  Home
age_group
18-25              35.0      20.0         25.0  20.0
26-35              31.0      31.0         27.0  12.0
36-45              21.0      21.0         25.0  33.0
46+                23.0      17.0         33.0  27.0
Total              27.0      22.0         28.0  23.0

I executed the above example code and added the screenshot below.

The margins=True parameter adds a “Total” row and column, giving you a comprehensive view of your data.

Check out Pandas Find Duplicates in Python

Format the Percentage Output

For better readability, you might want to format the percentages:

# Format as percentages with one decimal place
def format_pct(val):
    return f'{val:.1f}%'

styled_pct = row_pct.round(3) * 100
styled_pct = styled_pct.applymap(format_pct)
print(styled_pct)

Real-World Example: Analyze Voting Patterns

Let’s look at a more realistic example analyzing voting patterns in the 2020 U.S. presidential election:

# Sample voting data
voter_data = {
    'state': np.random.choice(['California', 'Texas', 'Florida', 'New York'], size=1000),
    'age_group': np.random.choice(['18-29', '30-44', '45-64', '65+'], size=1000),
    'education': np.random.choice(['High School', 'Some College', 'College Grad', 'Post-Grad'], size=1000),
    'candidate': np.random.choice(['Biden', 'Trump'], size=1000, p=[0.52, 0.48])  # Roughly matching 2020 results
}

voter_df = pd.DataFrame(voter_data)

# Analyze voting patterns by state and education
state_edu_vote = pd.crosstab([voter_df['state'], voter_df['education']], 
                             voter_df['candidate'], 
                             normalize='index')

print("Percentage of votes by state and education level:")
print((state_edu_vote * 100).round(1))

This analysis helps us understand how education levels within each state correlate with voting preferences.

Common Issues and Solutions

Let me show you some common issues that are faced while creating a pandas crosstab percentage in Python and solutions to them.

Issue 1: Deal with NaN Values

If your crosstab contains NaN values, you can handle them like this:

# Fill NaN values with zeros
clean_pct = row_pct.fillna(0)

Issue 2: Round Errors

Sometimes percentage columns don’t sum exactly to 100% due to rounding:

# Force columns to sum to 100%
def normalize_to_100(series):
    return series / series.sum() * 100

adjusted_pct = row_pct.apply(normalize_to_100, axis=0)

I hope you found this guide helpful for creating percentage-based crosstabs in Pandas.

In this guide, I have explained how to create a pandas crosstab percentage in Python. The methods that I discussed are: using pd.crosstab with the normalize parameter, multiple-level crosstab with percentages, and adding margins with percentages.

I also covered different normalization options, real-world examples of analyzing voting patterns, and some common issues and solutions.

Other Python articles you may also like:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

How to Create Pandas Crosstab Percentage in Python

Crosstab in Pandas

Create Pandas Crosstab Percentage in Python

Method 1: Use pd.crosstab with the normalize Parameter

Different Normalization Options

Method 2: Multi-level Crosstab with Percentages

Method 3: Add Margins with Percentages

Format the Percentage Output

Real-World Example: Analyze Voting Patterns

Common Issues and Solutions

Issue 1: Deal with NaN Values

Issue 2: Round Errors

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends