What is the pd.crosstab function in Python [with 2 Examples]

In this Python tutorial, I will explain the pd.crosstab function in Python, its syntax, the parameters required, and the return value. I will also take some examples to elaborate on the crosstab in Python Pandas.

Pandas is a powerful library in Python used for data analysis and offers us various functions to manipulate and analyze data. One of the functions is the pd.crosstab function in Python.

When I was working with some data in Python, I got a requirement to do a simple cross-tabulation of two factors in a dataset in Pandas. So, I came across the pd.crosstab() function.

pd.crosstab function in Python

The Pandas crosstab function in Python is used to compute a simple cross tabulation of two (or more) factors.

By default, this computes a frequency table of the factors unless an array of values and an aggregation function are passed in Python.

Syntax of the crosstab in Pandas:

The basic syntax of pd.crosstab in Python is as follows:

pd.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)

Parameter required in the crosstab in Python:

Here is the list of all the arguments that we can pass in the crosstab function of Pandas and their roles in Python:

Arguments NameDescription
indexThis represents the values to group by in the rows (similar to the ‘groupby’ operation).
columnsThese are the values to group by in the columns.
valuesAn optional parameter that represents the values to aggregate. If not provided, the function will count occurrences.
rownames(Optional) Names to be assigned to the resulting row and column index.
colnames(Optional) Names to be given to the column index.
aggfunc(Optional) Specifies the aggregation function to apply when values are provided. If not specified, it defaults to counting.
margins(Optional) If True, it adds row and column margins (totals).
margins_name(Optional) Name of the row and column containing the margins if margins are True.
dropna(Optional) If False, it includes all values, including those not present in the data, resulting in NaNs.
normalize(Optional) If True, it normalizes by dividing all values by the sum of values.
List of arguments in the pd.crosstab function in Python.

Down, I am explaining the uses of some of the arguments in the pd.crosstab in Python:

READ:  Difference between app and project in Django
Arguments Namesuses
index and columnsThese essential parameters represent the categorical variables for which the cross-tabulation will be computed. We can pass single columns or lists of columns.
valuesThis parameter is used to specify the values to be aggregated. If not provided, the result will show the counts of occurrences.
aggfuncIf values are specified, aggfunc is the aggregation function to be applied. Common choices include sum, mean, count, etc.
marginsWhen set to True, it adds row and column margins, providing subtotals for each row and column.
normalizeSetting this to True will convert the counts to percentages, providing a normalized data view.
List the uses of some of the parameters of the crosstab function in Python.

The return value of the crosstab Pandas function in Python:

The pd.crosstab function in Python returns a DataFrame representing the cross-tabulation of the input variables.

The rows correspond to the values of the index parameter, and the columns correspond to the values of the columns parameter.

Crosstab Pandas Example

Let’s see some examples and try to understand how we can use the Pandas crosstab function in Python:

1. Crosstab function in Python primary use

Let’s take an example that will analyze and summarize the data in a tabular form using the pd.crosstab in Python:

import pandas as pd

Staff_data = {'Names': ['Joey', 'Monica', 'Chandler', 'Rachel', 'Ross', 'Phoebe'],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
        'Technologies': ['Python', 'Marketing', 'Python', 'Python', 'Marketing', 'Marketing']}
df = pd.DataFrame(Staff_data)

result = pd.crosstab(df['Gender'], df['Technologies'])
print(result)

Output: The pd.crosstab function in Python is used to create a cross-tabulation of the ‘Gender‘ and ‘Technologies‘ columns from the DataFrame. The result is stored in the variable result.

Technologies  Marketing  Python
Gender                         
Female                2       1
Male                  1       2

Here is a screenshot of the implementation of the code in the Pycharm Python editor:

pd.crosstab function in Python

2. Python Pandas crosstab using an aggregation function

Let’s take another example, where we can use different arguments present in the pd.crosstab in Python to calculate cross tabulation:

import pandas as pd

Client_data = {'Names': ['Joey', 'Monica', 'Chandler', 'Rachel', 'Ross', 'Phoebe'],
        'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
        'Products': ['Iphone', 'Mac Book', 'Iphone', 'Iphone', 'Mac Book', 'Iphone']}
df = pd.DataFrame(Client_data)

result = pd.crosstab(df['Gender'], df['Products'], values=df['Gender'], aggfunc='count')
print(result)

Output: The pd.crosstab function in Python is used to create a cross-tabulation of the ‘Gender‘ and ‘Products‘ columns from the DataFrame.

READ:  Tensorflow Activation Functions

The values=df[‘Gender’] parameter specifies that we want to count the occurrences of each combination of gender and product.

The aggfunc=’count’ parameter specifies the aggregation function to use, which is counting in this case.

Products  Iphone  Mac Book
Gender                    
Female         2         1
Male           2         1

Upon executing the code in Pycharm, the resulting output is displayed in the screenshot below.

pandas crosstab in python

Conclusion

Hopefully, I explained the pd.crosstab function in Python in detail. I tried to explain the syntax of the crosstab in Python, its required parameters, and its return values with the help of some examples like basic or aggregation functions.

Understanding crosstab in Pandas can help one efficiently compute the cross-tabulation of variables in a dataset in Python.

You may also like to read: