Stop copy pasting code you don't actually understand

Build the coding confidence you need to become a developer companies will fight for

Stop copy pasting code you don't actually understand

Start FREE Trial

Stop copy pasting code you don't actually understand

Build the coding confidence you need to become a developer companies will fight for

Stop copy pasting code you don't actually understand

Start FREE Trial

Learn

Practice

Compete

Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas crosstab()

The crosstab() method in Pandas allows us to create contingency tables, also known as cross-tabulations.

A contingency table helps us understand the relationship between two or more categorical variables within a dataset.

Example

import pandas as pd

# sample DataFrame
data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No']}

df = pd.DataFrame(data)

# create a cross-tabulation of Gender and Smoker
cross_tab = pd.crosstab(df['Gender'], df['Smoker'])

print(cross_tab)

'''
Output

Smoker  No  Yes
Gender         
Female   2    0
Male     1    2
'''

crosstab() Syntax

The syntax of the crosstab() method in Pandas is:

pd.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)

crosstab() Arguments

The crosstab() method has the following arguments:

index: the column or array-like object whose values will be used as rows
columns: the column or array-like object whose values will be used as columns
values (optional): the column to aggregate values based on the intersection of index and columns
rownames (optional): the names to be used for the row index
colnames (optional): the names to be used for the column index
aggfunc (optional): the aggregation function to apply to values
margins (optional): whether to include row and column margins
margins_name (optional): the name to be used for the margin labels
dropna (optional): whether to exclude missing values
normalize (optional): whether to normalize the values to show proportions.

crosstab() Return Value

The crosstab() method returns a DataFrame representing the cross-tabulation of the factors specified in index and columns.

Example 1: Basic Cross-Tabulation

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Employed': ['Yes', 'Yes', 'Yes', 'Yes', 'No']}

df = pd.DataFrame(data)

# create a basic cross-tabulation of Gender and Employed
cross_tab = pd.crosstab(df['Gender'], df['Employed'])

print(cross_tab)

Output

Employed  No  Yes
Gender            
Female      0    2
Male        1    2

In this example, we created a basic cross-tabulation of Gender and Employed to understand the distribution of employed and unemployed people among genders.

Example2: Margins in crosstab()

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No']}

df = pd.DataFrame(data)

# create a cross-tabulation with margins
cross_tab = pd.crosstab(df['Gender'], df['Smoker'], margins=True, margins_name='Total')

print(cross_tab)

Output

Smoker  No  Yes  Total
Gender                
Female   2    0      2
Male     1    2      3
Total    3    2      5

In this example, we included row and column margins in the cross-tabulation to show the totals for each row and column.

Example 3: Normalized Cross-Tabulation

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No']}

df = pd.DataFrame(data)

# create a normalized cross-tabulation of Gender and Smoker
cross_tab = pd.crosstab(df['Gender'], df['Smoker'], normalize=True)

print(cross_tab)

Output

Smoker        No       Yes
Gender                    
Female  0.166667  0.166667
Male    0.333333  0.333333

In this example, we created a normalized cross-tabulation to show proportions instead of raw counts.

Example 4: Aggregate Functions with crosstab()

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No'],
        'Age': [25, 30, 35, 40, 45]}

df = pd.DataFrame(data)

# create a cross-tabulation of Gender and Smoker with average Age as the aggregation
cross_tab = pd.crosstab(df['Gender'], df['Smoker'], values=df['Age'], aggfunc='mean')

print(cross_tab)

Output

Smoker    No   Yes
Gender            
Female  35.0   NaN
Male    45.0  30.0

In this example, we used aggfunc=mean to calculate the mean age for smokers and non smokers of different genders.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO

Real-World Projects
On-Demand Learning
AI Mentor
Builder Community

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python
programmer.

Popular Tutorials

Reference Materials

Popular Examples

Pandas crosstab()

Example

crosstab() Syntax

crosstab() Arguments

crosstab() Return Value

Example 1: Basic Cross-Tabulation

Example2: Margins in crosstab()

Example 3: Normalized Cross-Tabulation

Example 4: Aggregate Functions with crosstab()

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python programmer.

Popular Tutorials

Reference Materials

Popular Examples

Pandas crosstab()

Example

crosstab() Syntax

crosstab() Arguments

crosstab() Return Value

Example 1: Basic Cross-Tabulation

Example2: Margins in crosstab()

Example 3: Normalized Cross-Tabulation

Example 4: Aggregate Functions with crosstab()

Become a certified Python
programmer.