Pandas cut()

The cut() method in Pandas is used for segmenting and sorting data values into bins.

Example

import pandas as pd

# create a list of ages
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]

# define the bins - age ranges
bins = [18, 25, 35, 60, 100]

# use cut() to categorize each age into the defined bins categories = pd.cut(ages, bins)
print(categories) ''' Output [(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]] Length: 12 Categories (4, interval[int64, right]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]] '''

cut() Syntax

The syntax of the cut() method in Pandas is:

Pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)

cut() Arguments

The cut() method takes following arguments:

  • x - the input array to be binned
  • bins - the criteria to bin by
  • right (optional) - indicates whether bins include the rightmost edge
  • labels (optional) - specifies the labels for the returned bins
  • retbins (optional) - specifies whether to return the bins or not
  • precision (optional) - precision at which to store and display the bins labels
  • include_lowest (optional) - whether the first interval should be left-inclusive or not.

cut() Return Value

The cut() method in Pandas returns a Series or an array that represents the specific bin or category each original value in the input data belongs to.


Example 1: Categorizing Data Using cut()

import pandas as pd

# create a list of exam scores
scores = [88, 92, 75, 85, 78, 95, 64, 82, 90, 73, 67, 99]

# define the bins - grading ranges
bins = [0, 60, 70, 80, 90, 100]

# use cut() to categorize each score into the defined grading bins grade_categories = pd.cut(scores, bins)
print(grade_categories)

Output

[(80, 90], (90, 100], (70, 80], (80, 90], (70, 80], ..., (80, 90], (80, 90], (70, 80], (60, 70], (90, 100]]
Length: 12
Categories (5, interval[int64, right]): [(0, 60] < (60, 70] < (70, 80] < (80, 90] < (90, 100]]

In the above example, we have created the list named scores containing exam scores.

The bins are defined to represent different grading ranges: 0-60, 61-70, 71-80, 81-90, 91-100.

Then we used pd.cut() to categorize each score into the corresponding grading bin.


Example 2: Control Bin Boundaries Using right Argument in cut()

import pandas as pd

# create a list of data
data = [2, 4, 6, 8, 10]

# define the bins
bins = [0, 5, 10]

# use cut() with right=True (default) categories_right_true = pd.cut(data, bins, right=True)
print("Bins closed on the right:") print(categories_right_true) print()
# use cut with right=False categories_right_false = pd.cut(data, bins, right=False)
print("\nBins closed on the left:") print(categories_right_false)

Output

Bins closed on the right:
[(0, 5], (0, 5], (5, 10], (5, 10], (5, 10]]
Categories (2, interval[int64, right]): [(0, 5] < (5, 10]]

Bins closed on the left:
[[0.0, 5.0), [0.0, 5.0), [5.0, 10.0), [5.0, 10.0), NaN]
Categories (2, interval[int64, left]): [[0, 5) < [5, 10)]

Here, with

  1. right=True, the bins are (0, 5] and (5, 10], indicating that the right edge (5 and 10) is included in the bin.
  2. right=False, the bins are [0, 5) and [5, 10), meaning the left edge (0 and 5) is included in the bin.

Example 3: Naming Bins in Pandas cut()

import pandas as pd

# create a list of data
data = [20, 35, 45, 60, 75, 90]

# define the bins
bins = [0, 25, 50, 75, 100]

# define custom labels for the bins
labels = ['Low', 'Medium', 'High', 'Very High']

# use cut() with custom labels categories_with_labels = pd.cut(data, bins, labels=labels)
print(categories_with_labels)

Output

['Low', 'Medium', 'Medium', 'High', 'High', 'Very High']
Categories (4, object): ['Low' < 'Medium' < 'High' < 'Very High']

In this example, we have defined the list of custom labels: Low, Medium, High, and Very High, corresponding to each bin.

Then used pd.cut() to categorize the data into bins and assign the custom labels to these bins.


Example 4: Extract Bin Information Using retbins Argument in cut()

import pandas as pd

# create a list of data
data = [10, 15, 20, 25, 30, 35, 40]

# define the bins
bins = [0, 20, 40]

# use cut() with retbins=True categories, bin_edges = pd.cut(data, bins, retbins=True)
print("Binned Categories:") print(categories) print("\nBin Edges:") print(bin_edges)

Output

Binned Categories:
[(0, 20], (0, 20], (0, 20], (20, 40], (20, 40], (20, 40], (20, 40]]
Categories (2, interval[int64, right]): [(0, 20] < (20, 40]]

Bin Edges:
[ 0 20 40]

In the above example, we used pd.cut() with retbins=True, so it returns two things: the binned categories and the array of bin edges.

The categories variable contains the binned data (each element of data categorized into the bins).

And the bin_edges variable contains the actual edges of the bins used in the process.


Example 5: Specify the precision of the Labels of the Bins

import pandas as pd

# create a list of floating-point data
data = [10.123, 15.456, 20.789, 25.012, 30.345, 35.678, 40.901]

# define the bins
bins = [0, 20, 40, 60]

# use cut() with precision=2 categories = pd.cut(data, bins, precision=2)
print("Binned Categories with Two Decimal Precision:") print(categories)

Output

Binned Categories with Two Decimal Precision:
[(0, 20], (0, 20], (20, 40], (20, 40], (20, 40], (20, 40], (40, 60]]
Categories (3, interval[int64, right]): [(0, 20] < (20, 40] < (40, 60]]

Here, we used pd.cut() with precision=2. This means that the labels of the bins will be formatted to have two decimal places.


Example 6: Use of include_lowest Argument in cut()

import pandas as pd

# create a list of data
data = [20, 22, 24, 26, 28, 30]

# define the bins
bins = [20, 25, 30]

# use cut() with include_lowest=False (default) categories_default = pd.cut(data, bins)
print("First bin exclusive of the lower edge:") print(categories_default) print()
# use cut() with include_lowest=True categories_include_lowest = pd.cut(data, bins, include_lowest=True)
print("\nFirst bin inclusive of the lower edge:") print(categories_include_lowest)

Output

First bin exclusive of the lower edge:
[NaN, (20.0, 25.0], (20.0, 25.0], (25.0, 30.0], (25.0, 30.0], (25.0, 30.0]]
Categories (2, interval[int64, right]): [(20, 25] < (25, 30]]

First bin inclusive of the lower edge:
[(19.999, 25.0], (19.999, 25.0], (19.999, 25.0], (25.0, 30.0], (25.0, 30.0], (25.0, 30.0]]
Categories (2, interval[float64, right]): [(19.999, 25.0] < (25.0, 30.0]]

In this example, with

  • include_lowest=False - the first bin (20, 25] does not include the lower edge 20. Thus, the value 20 in the data is not included in any bin, resulting in NaN.
  • include_lowest=True - the first bin [20, 25] is inclusive of the lower edge 20. Therefore, the value 20 is included in the first bin, and there are no NaN values.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO
  • Real-World Projects
  • On-Demand Learning
  • AI Mentor
  • Builder Community