Pandas value_counts()

The value_counts() method in Pandas is used to count the number of occurrences of each unique value in a Series.

Example

import pandas as pd

# create a Series
data = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'apple'])

# use value_counts() to count the occurrences of each unique value counts = data.value_counts()
print(counts) ''' Output apple 3 banana 2 orange 1 Name: count, dtype: int64 '''

value_counts() Syntax

The syntax of the value_counts() method in Pandas is:

Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

value_counts() Arguments

The value_counts() method takes following arguments:

  • normalize (optional) - if set to True, returns the relative frequencies (proportions) of unique values instead of their counts
  • sort (optional) - determines whether to sort the unique values by their counted frequencies
  • ascending (optional) - determines whether to sort the counts in ascending or descending
  • bins (optional) - groups numeric data into equal-width bins if specified
  • dropna (optional) - exclude null values if True.

value_counts() Return Value

The value_counts() method in Pandas returns a Series containing counts of unique values.


Example 1: Count Occurrences of Each Unique Value

import pandas as pd

# create a Series
favorite_colors = pd.Series(['blue', 'red', 'green', 'blue', 'red', 'blue', 'yellow', 'green', 'green', 'red', 'purple'])

# use value_counts() to count the occurrences of each unique value color_counts = favorite_colors.value_counts()
print(color_counts)

Output

blue      3
red       3
green     3
yellow    1
purple    1
Name: count, dtype: int64

In the above example, the favorite_colors Series contains strings representing different colors.

We used the value_counts() method to count the number of times each color appears in the favorite_colors series.


Example 2: Count Values With Normalization

import pandas as pd

# create a Series
data = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'apple'])

# set normalize to True to count values with normalization normalized_counts = data.value_counts(normalize=True)
print(normalized_counts)

Output

apple     0.500000
banana    0.333333
orange    0.166667
Name: proportion, dtype: float64

In this example, apple appears 3 times, banana appears 2 times, and orange appears 1 time.

The normalize=True shows the proportion of each fruit in the Series. For instance, apple make up 50% of the entries, banana 33.33%, and orange make up 16.67%.


Example 3: Sort Unique Value Counts in Pandas

import pandas as pd

# create a Series 
data = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'apple', 'banana', 'kiwi', 'kiwi', 'kiwi'])

# use value_counts() without sorting counts_no_sort = data.value_counts(sort=False)
print("Counts without sorting:") print(counts_no_sort) print()
# use value_counts() with sorting counts_sort = data.value_counts(sort=True)
print("Counts with sorting:") print(counts_sort)

Output

Counts without sorting:
apple     3
banana    3
orange    1
kiwi      3
Name: count, dtype: int64

Counts with sorting:
apple     3
banana    3
kiwi      3
orange    1
Name: count, dtype: int64

Here,

  1. sort=False - shows the counts of each unique value in the order they appear in the Series (without sorting).
  2. sort=True - shows the counts sorted in descending order, which is the default behavior of value_counts().

Example 4: Specify the Order of Sorting

import pandas as pd

# create a Series
data = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'orange', 'orange'])

# count values with sorting in descending order (default behavior) counts_descending = data.value_counts(ascending=False)
print("Counts (Descending Order):") print(counts_descending) print()
# count values with sorting in ascending order counts_ascending = data.value_counts(ascending=True)
print("Counts (Ascending Order):") print(counts_ascending)

Output

Counts (Descending Order):
orange    3
apple     2
banana    2
Name: count, dtype: int64

Counts (Ascending Order):
apple     2
banana    2
orange    3
Name: count, dtype: int64

In this example, with

  • ascending=False - sorts the counts in descending order. This will list the most frequent item first.
  • ascending=True - sorts the counts in ascending order, listing the least frequent item first.

Example 5: Use of bins Argument in value_counts()

The bins argument in the value_counts() method is used to bin continuous data into discrete intervals.

Let's look at an example.

import pandas as pd

# create a Series with continuous data
data = pd.Series([0.1, 0.3, 0.5, 1.2, 1.5, 2.3, 2.8, 3.0, 3.5, 4.1])

# use value_counts() with the bins argument bin_counts = data.value_counts(bins=4)
print(bin_counts)

Output

(0.095, 1.1]    3
(2.1, 3.1]      3
(1.1, 2.1]      2
(3.1, 4.1]      2
Name: count, dtype: int64

In this example, the value_counts() method divides the range of data in data into 4 equal-width bins and counts the number of values that fall into each bin.

The output we see represents these counts, with each bin described by its range. For example, (0.095, 1.1] indicates a bin that includes values greater than 0.095 and up to 1.1.


Example 6: Use of dropna Argument in value_counts()

import pandas as pd

# create a Series with some missing values
data = pd.Series(['apple', 'banana', 'apple', 'orange', None, 'banana', None])

# use value_counts() without specifying dropna (default is True) counts_default = data.value_counts()
print("Counts excluding None:") print(counts_default) print()
# use value_counts() with dropna=False counts_including_na = data.value_counts(dropna=False)
print("\nCounts including None:") print(counts_including_na)

Output

Counts excluding None:
apple     2
banana    2
orange    1
Name: count, dtype: int64

Counts including None:
apple     2
banana    2
None      2
orange    1
Name: count, dtype: int64

Here,

  • dropna=True counts the occurrences of each fruit and excludes the None values.
  • dropna=False includes the None values in the count, showing how many missing values are present in the data