Stop copy pasting code you don't actually understand

Build the coding confidence you need to become a developer companies will fight for

Stop copy pasting code you don't actually understand

Build the coding confidence you need to become a developer companies will fight for

Stop copy pasting code you don't actually understand

Start FREE Trial

Learn

Practice

Compete

Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas get_dummies()

The get_dummies() method in Pandas is used to convert categorical variables into dummy variables.

Each category is transformed into a new column with binary value (1 or 0) indicating the presence of the category in the original data.

Example

import pandas as pd

# create a Series
data = pd.Series(['A', 'B', 'A', 'C', 'B'])

# use get_dummies on the Series
dummies = pd.get_dummies(data)

print(dummies)

'''
Output

    A  B  C
0  1  0  0
1  0  1  0
2  1  0  0
3  0  0  1
4  0  1  0
'''

get_dummies() Syntax

The syntax of the get_dummies() method in Pandas is:

get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, drop_first=False)

get_dummies() Arguments

The get_dummies() method takes following arguments:

data - the input data to be transformed
prefix (optional) - string to append DataFrame column names
prefix_sep (optional) - separator for the prefix and the dummy column name
dummy_na (optional) - add a column to indicate NaNs, if False NaNs are ignored.
drop_first (optional) - whether to remove first level or not

get_dummies() Return Value

The get_dummies() method returns a DataFrame where the value in the input becomes a separate column filled with binary values (1s and 0s), indicating the presence or absence of that value in each row of the original data.

Example 1: Grouping by a Single Column in Pandas

import pandas as pd

# create a Series
data = pd.Series(['apple', 'orange', 'apple', 'banana'])

# use get_dummies() to convert the series into dummy variables
dummy_data = pd.get_dummies(data)

print(dummy_data)

Output

   apple  banana  orange
0      1       0       0
1      0       0       1
2      1       0       0
3      0       1       0

In the above example, we have created the data Series with fruit names.

We then applied get_dummies() which creates a new DataFrame where each fruit name becomes a column.

And for each row in the data Series, the corresponding column in the new DataFrame will have a 1 if the fruit name was present in that row, and 0 otherwise.

Example 2: Apply get_dummies() With Prefix

import pandas as pd

# sample data
data = {'Color': ['Red', 'Green', 'Blue', 'Green', 'Red']}

# create a DataFrame
df = pd.DataFrame(data)

# get dummies with a specified prefix
dummies = pd.get_dummies(df['Color'], prefix='Color')

print(dummies)

Output

    Color     Color_Blue    Color_Green  Color_Red
0    Red          0           0            1
1  Green          0           1            0
2   Blue          1           0            0
3  Green          0           1            0
4    Red          0           0            1

Here, we have passed the prefix='Color' argument to get_dummies(), so the new dummy variable columns are prefixed with Color_.

Hence, the resulting DataFrame contains columns Color_Blue, Color_Green, and Color_Red, representing the presence or absence of the respective color categories.

Example 3: Get Dummies With Specified Prefix and Prefix Separator

import pandas as pd

# sample data
data = {'Color': ['Red', 'Green', 'Blue', 'Green', 'Red']}

# create a DataFrame
df = pd.DataFrame(data)

# get dummies with a specified prefix and prefix separator
dummies = pd.get_dummies(df['Color'], prefix='Color', prefix_sep='--')

print(dummies)

Output

              Color--Blue      Color--Green  Color--Red
0                 0                  0            1
1                 0                  1            0
2                 1                  0            0
3                 0                  1            0
4                 0                  0            1

In this example, the prefix_sep='--' argument means that the prefix and the original category name will be separated by --.

So, for a color like Blue, the resulting column name in the dummies DataFrame would be Color--Blue and so on.

Example 4: Use dummy_na to Manage Missing Data

import pandas as pd

# sample data with a missing value
data = {'Color': ['Red', 'Green', 'Blue', None, 'Red']}

# create a DataFrame
df = pd.DataFrame(data)

# get dummies without considering NaN
dummies_without_nan = pd.get_dummies(df['Color'])

# get dummies considering NaN
dummies_with_nan = pd.get_dummies(df['Color'], dummy_na=True)

print("Dummies without NaN handling:\n", dummies_without_nan)
print("\nDummies with NaN handling:\n", dummies_with_nan)

Output

Dummies without NaN handling:
       Blue  Green  Red
0       0      0    1
1       0      1    0
2       1      0    0
3       0      0    0
4       0      0    1

Dummies with NaN handling:
     Blue      Green     Red  NaN
0     0          0        1    0
1     0          1        0    0
2     1          0        0    0
3     0          0        0    1
4     0          0        1    0

Here,

get_dummies(df['Color']) - generates columns for Red, Green, and Blue, but no indication of the NaN value.
get_dummies(df['Color'], dummy_na=True) - generates the same columns and an additional one called NaN indicating where NaN values were present in the original data.

Example 5: Specifying Columns for Dummy Encoding

import pandas as pd

# sample data
data = {'Color': ['Red', 'Green', 'Blue', 'Green', 'Red']}

# creating a DataFrame
df = pd.DataFrame(data)

# getting dummies without dropping any columns
dummies_all = pd.get_dummies(df['Color'])

print("DataFrame with all dummy columns:")
print(dummies_all)
print("\n")

# getting dummies and dropping the first category column ('Blue' in this case)
dummies = pd.get_dummies(df['Color'], drop_first=True)

print("DataFrame after dropping 'Blue':")
print(dummies)

Output

DataFrame with all dummy columns:
   Color  Blue  Green   Red
0    Red     0      0    1
1  Green     0      1    0
2   Blue     1      0    0
3  Green     0      1    0
4    Red     0      0    1


DataFrame after dropping 'Blue':
   Color  Green   Red
0    Red      0   1
1  Green      1   0
2   Blue      0   0
3  Green      1   0
4    Red      0   1

Here, the drop_first=True argument is passed to get_dummies() to indicate that the first category should be dropped.

Hence the resulting DataFrame contains two columns Green and Red. The category named Blue is not represented in these columns because it was dropped.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO

Real-World Projects
On-Demand Learning
AI Mentor
Builder Community

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python
programmer.

Popular Tutorials

Reference Materials

Popular Examples

Pandas get_dummies()

Example

get_dummies() Syntax

get_dummies() Arguments

get_dummies() Return Value

Example 1: Grouping by a Single Column in Pandas

Example 2: Apply get_dummies() With Prefix

Example 3: Get Dummies With Specified Prefix and Prefix Separator

Example 4: Use dummy_na to Manage Missing Data

Example 5: Specifying Columns for Dummy Encoding

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python programmer.

Popular Tutorials

Reference Materials

Popular Examples

Pandas get_dummies()

Example

get_dummies() Syntax

get_dummies() Arguments

get_dummies() Return Value

Example 1: Grouping by a Single Column in Pandas

Example 2: Apply get_dummies() With Prefix

Example 3: Get Dummies With Specified Prefix and Prefix Separator

Example 4: Use dummy_na to Manage Missing Data

Example 5: Specifying Columns for Dummy Encoding

Become a certified Python
programmer.