The qcut()
method in Pandas is used for dividing a continuous variable into quantile-based bins, effectively transforming it into a categorical variable.
Example
import pandas as pd
# define a list of numeric data
data = [320, 280, 345, 378, 290, 310, 260, 300]
# use qcut() to divide the data into 4 quantiles
result = pd.qcut(data, 4)
print(result)
'''
Output
[(305.0, 326.25], (259.999, 287.5], (326.25, 378.0], (326.25, 378.0], (287.5, 305.0], (305.0, 326.25], (259.999, 287.5], (287.5, 305.0]]
Categories (4, interval[float64, right]): [(259.999, 287.5] < (287.5, 305.0] < (305.0, 326.25] <
(326.25, 378.0]]
'''
qcut() Syntax
The syntax of the qcut()
method in Pandas is:
pandas.qcut(x, q, labels=None, retbins=False, precision=3)
qcut() Arguments
The qcut()
method takes following arguments:
x
- the input array to be binnedq
- the number of quantiles or array of quantileslabels
(optional) - specifies the labels for the returned binsretbins
(optional) - specifies whether to return the bins or notprecision
(optional) - precision of the quantiles.
qcut() Return Value
The qcut()
method in Pandas returns a Categorical object representing the binned variable with equal frequency bins.
Example 1: Categorizing Data Using qcut()
import pandas as pd
# create a list of temperatures
temperatures = [68, 72, 75, 80, 85, 90, 95, 100, 65, 70, 78, 82]
# use qcut() to categorize each temperature into 4 equal-sized bins (quartiles)
temperature_categories = pd.qcut(temperatures, 4)
print(temperature_categories)
Output
[(64.999, 71.5], (71.5, 79.0], (71.5, 79.0], (79.0, 86.25], (79.0, 86.25], ..., (86.25, 100.0], (64.999, 71.5], (64.999, 71.5], (71.5, 79.0], (79.0, 86.25]] Length: 12 Categories (4, interval[float64, right]): [(64.999, 71.5] < (71.5, 79.0] < (79.0, 86.25] < (86.25, 100.0]]
In the above example, we have the list named temperatures containing various temperature readings.
We then used pd.qcut()
to divide these temperature values into 4 quartiles, ensuring an equal number of temperatures in each bin.
Example 2: Naming Bins in Pandas qcut()
import pandas as pd
# create a list of exam scores
scores = [67, 85, 78, 92, 74, 70, 56, 90]
# define custom labels for the bins
bin_labels = ['D', 'C', 'B', 'A']
# use qcut() to divide scores into 4 quantiles and assign the custom labels
score_categories = pd.qcut(scores, 4, labels=bin_labels)
print(score_categories)
Output
['D', 'B', 'B', 'A', 'C', 'C', 'D', 'A'] Categories (4, object): ['D' < 'C' < 'B' < 'A']
In this example, we defined the bin_labels list with string labels D
, C
, B
, A
that correspond to quartile grades.
The pd.qcut()
method is used to categorize the scores into 4 bins (quartiles) based on their distribution, with each bin getting a label from bin_labels.
Example 4: Extract Bin Information Using retbins Argument in qcut()
import pandas as pd
# create a list of data points
data_points = [12, 20, 19, 27, 25, 35, 29, 40, 31, 38]
# use qcut() with retbins=True to get both the binned data and the bin edges
binned_data, bins = pd.qcut(data_points, 4, retbins=True)
print("Binned Data:")
print(binned_data)
print("\nBin Edges:")
print(bins)
Output
Binned Data: [(11.999, 21.25], (11.999, 21.25], (11.999, 21.25], (21.25, 28.0], (21.25, 28.0], (34.0, 40.0], (28.0, 34.0], (34.0, 40.0], (28.0, 34.0], (34.0, 40.0]] Categories (4, interval[float64, right]): [(11.999, 21.25] < (21.25, 28.0] < (28.0, 34.0] < (34.0, 40.0]] Bin Edges: [12. 21.25 28. 34. 40. ]
In the above example, we use the pd.qcut()
method with the retbins=True
argument to categorize a list of numeric data points into quantiles and also to obtain the precise bin edges that define these quantiles.
Example 5: Specify the precision of the Labels of the Bins
import pandas as pd
# create a list of floating-point numbers
data = [1.123, 2.345, 3.567, 4.789, 5.901, 6.234, 7.456, 8.678]
# use qcut() to divide data into 4 quantiles
quantiles = pd.qcut(data, 4, precision=2)
print(quantiles)
Output
[(1.11, 3.26], (1.11, 3.26], (3.26, 5.34], (3.26, 5.34], (5.34, 6.54], (5.34, 6.54], (6.54, 8.68], (6.54, 8.68]] Categories (4, interval[float64, right]): [(1.11, 3.26] < (3.26, 5.34] < (5.34, 6.54] < (6.54, 8.68]]
Here, we used pd.qcut()
with precision=2
. This means that the labels of the bins will be displayed with 2 decimal places.