The describe() method in Pandas provides a statistical summary of the dataset; central tendency, dispersion, and shape of the distribution.
Example
import pandas as pd
# create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9]}
df = pd.DataFrame(data)
# use describe() to get the statistical summary of the DataFrame
summary = df.describe()
print(summary)
'''
Output
A B
count 5.000000 5.000000
mean 3.000000 7.000000
std 1.581139 1.581139
min 1.000000 5.000000
25% 2.000000 6.000000
50% 3.000000 7.000000
75% 4.000000 8.000000
max 5.000000 9.000000
'''
describe() Syntax
The syntax of the describe() method in Pandas is:
obj.describe(percentiles=None, include=None, exclude=None)
describe() Arguments
The describe() method takes the following arguments:
percentiles(optional) - a list-like object of numbers which determines the percentiles to include in the outputinclude(optional) - a list-like object of data types to include in the outputexclude(optional) - a list-like object of data types to exclude from the output.
describe() Return Value
The describe() method returns a DataFrame that provides descriptive statistics of the input DataFrame or Series.
Example 1: describe() for Categorical Data
We can also use describe() to get the description of categorical data.
import pandas as pd
# create a sample DataFrame with categorical data
data = {'Colors': ['Red', 'Blue', 'Blue', 'Red', 'Green']}
df = pd.DataFrame(data)
# get the description of categorical data
description = df.describe(include='all')
print(description)
Output
Colors
count 5
unique 3
top Red
freq 2
Example 2: Custom Percentiles
import pandas as pd
# create a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
# use describe() specifying custom percentiles
description = df.describe(percentiles=[.1, .5, .9])
print(description)
Output
Values
count 5.000000
mean 30.000000
std 15.811388
min 10.000000
10% 14.000000
50% 30.000000
90% 46.000000
max 50.000000
In this example, we provided the custom percentiles (10%, 50% and 90%) to the describe() method to get those details.
Example 3: Including and Excluding Data Types
import numpy as np
import pandas as pd
# create a mixed DataFrame
data = {
'Age': [25, 30, 35, 40],
'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
df = pd.DataFrame(data)
# describe only numeric columns
numeric_description = df.describe(include=[np.number])
print("Numbers only:")
print(numeric_description)
print()
# describe only object columns
print("Other types only:")
str_description = df.describe(exclude=[np.number])
print(str_description)
Output
Numbers only:
Age
count 4.000000
mean 32.500000
std 6.454972
min 25.000000
25% 28.750000
50% 32.500000
75% 36.250000
max 40.000000
Other types only:
Name
count 4
unique 4
top Alice
freq 1
In this example, we included and excluded certain data types to get the summary of specified data types only.
Here, we used NumPy data types because NumPy provides specific data types (numeric, categorical, etc.) that are consistent with Pandas since Pandas is built on Numpy.