Pandas Histogram

Pandas histograms is a graphical representation of the distribution of numerical data. In Pandas, using the hist() function, we can create and plot histograms.

We'll take a closer look at histograms and how they can be created and plotted in Pandas.


Pandas Histogram

Pandas has a built-in function hist() that takes an array of data as a parameter.

In histogram, a bin is a range of values that represents a group of data. bin is an optional parameter.

Let's look at an example.

import pandas as pd
import matplotlib.pyplot as plt

# create a DataFrame
data = {'values': [18, 21, 22, 25, 28, 30, 32, 34, 35, 36, 38, 40, 42, 45, 50, 55, 60, 65, 70]}
df = pd.DataFrame(data)

# plot a histogram
plt.hist(df['values'], bins=10)
plt.show()

Output

Plotting a Histogram
Plotting a Histogram

In this example, we have used the hist() function to create a histogram.

Inside the hist() funcion,

  • df['values'] creates a histogram from the values column of the DataFrame
  • bins=10 specifies that the data should be divided into 10 bins or intervals.

Pandas Customized Histogram

A customized histogram is a type of histogram with specific visual characteristics that effectively communicate information about the data being displayed.

Let's look at an example.

import pandas as pd
import matplotlib.pyplot as plt

# create a DataFrame with more data
data = {'values': [23, 45, 30, 50, 67, 35, 47, 62, 25, 58, 42, 36, 53, 68, 32]}
df = pd.DataFrame(data)

# plot a customized histogram
plt.hist(df['values'], bins=7, edgecolor='black', color='green', alpha=0.7)
plt.title('Customized Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

Output

Plotting a Customized Histogram
Plotting a Customized Histogram

Here, in the customized histogram, we have used additional visual customizations such as color, transparency, grid lines, making it more visually appealing than the basic one.


Multiple Histograms in Pandas

In Pandas, we can create multiple histograms to compare different datasets. For example,

import pandas as pd
import matplotlib.pyplot as plt

# Create two DataFrames with different datasets
data1 = {'values': [12, 15, 18, 22, 25, 27, 30, 33, 37, 40]}
data2 = {'values': [8, 10, 14, 20, 24, 28, 32, 36, 42, 45]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Plot two histograms side by side
plt.hist(df1['values'], bins=6, edgecolor='black', alpha=0.7, label='Dataset 1')
plt.hist(df2['values'], bins=6, edgecolor='black', alpha=0.7, label='Dataset 2', color='orange')
plt.title('Histogram Comparison')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Output

Plotting a Multiple Histogram
Plotting a Multiple Histogram

In this example, we have compared two histograms side by side, illustrating the frequency distribution of values in two separate datasets.

The first dataset's histogram is labeled Dataset 1 and uses default colors, while the second dataset's histogram is labeled Dataset 2, uses orange bars.