The describe()
method in Pandas provides a statistical summary of the dataset; central tendency, dispersion, and shape of the distribution.
Example
import pandas as pd
# create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9]}
df = pd.DataFrame(data)
# use describe() to get the statistical summary of the DataFrame
summary = df.describe()
print(summary)
'''
Output
A B
count 5.000000 5.000000
mean 3.000000 7.000000
std 1.581139 1.581139
min 1.000000 5.000000
25% 2.000000 6.000000
50% 3.000000 7.000000
75% 4.000000 8.000000
max 5.000000 9.000000
'''
describe() Syntax
The syntax of the describe()
method in Pandas is:
obj.describe(percentiles=None, include=None, exclude=None)
describe() Arguments
The describe()
method takes the following arguments:
percentiles
(optional) - a list-like object of numbers which determines the percentiles to include in the outputinclude
(optional) - a list-like object of data types to include in the outputexclude
(optional) - a list-like object of data types to exclude from the output.
describe() Return Value
The describe()
method returns a DataFrame that provides descriptive statistics of the input DataFrame or Series.
Example 1: describe() for Categorical Data
We can also use describe()
to get the description of categorical data.
import pandas as pd
# create a sample DataFrame with categorical data
data = {'Colors': ['Red', 'Blue', 'Blue', 'Red', 'Green']}
df = pd.DataFrame(data)
# get the description of categorical data
description = df.describe(include='all')
print(description)
Output
Colors count 5 unique 3 top Red freq 2
Example 2: Custom Percentiles
import pandas as pd
# create a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
# use describe() specifying custom percentiles
description = df.describe(percentiles=[.1, .5, .9])
print(description)
Output
Values count 5.000000 mean 30.000000 std 15.811388 min 10.000000 10% 14.000000 50% 30.000000 90% 46.000000 max 50.000000
In this example, we provided the custom percentiles (10%
, 50%
and 90%
) to the describe()
method to get those details.
Example 3: Including and Excluding Data Types
import numpy as np
import pandas as pd
# create a mixed DataFrame
data = {
'Age': [25, 30, 35, 40],
'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
df = pd.DataFrame(data)
# describe only numeric columns
numeric_description = df.describe(include=[np.number])
print("Numbers only:")
print(numeric_description)
print()
# describe only object columns
print("Other types only:")
str_description = df.describe(exclude=[np.number])
print(str_description)
Output
Numbers only: Age count 4.000000 mean 32.500000 std 6.454972 min 25.000000 25% 28.750000 50% 32.500000 75% 36.250000 max 40.000000 Other types only: Name count 4 unique 4 top Alice freq 1
In this example, we included and excluded certain data types to get the summary of specified data types only.
Here, we used NumPy data types because NumPy provides specific data types (numeric, categorical, etc.) that are consistent with Pandas since Pandas is built on Numpy.