The set_index()
method in Pandas is used to set the index of the DataFrame.
This method allows us to use one or more columns as the index. Once set, the specified column(s) will become the new row labels of the DataFrame.
Example
import pandas as pd
# sample DataFrame
df = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3']
})
# set column 'A' as the index
df = df.set_index('A')
print(df)
'''
Output
B C
A
A0 B0 C0
A1 B1 C1
A2 B2 C2
A3 B3 C3
'''
set_index() Syntax
The syntax of the set_index()
method in Pandas is:
df.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
set_index() Arguments
The set_index()
method takes following arguments:
keys
- specifies which column(s) to use as the new index.drop
(optional) - ifTrue
, removes the column(s) used as the new index. IfFalse
, the column(s) is retained in the DataFrame.append
(optional) - ifTrue
, adds the new index alongside the existing index. IfFalse
, the existing index is replaced with the new one.inplace
(optional) - ifTrue
, modifies the original DataFrame in place. IfFalse
, returns a new DataFrame.verify_integrity
(optional) - ifTrue
, ensures the new index doesn't have duplicate values. IfFalse
, doesn't check for duplicates.
set_index() Return Value
The set_index()
method returns a new DataFrame with the specified column(s) set as the index.
Example 1: Set a Single Column as the Index
import pandas as pd
# creating a sample DataFrame
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# setting 'ID' column as the index
df_indexed = df.set_index('ID')
print("DataFrame after setting 'ID' as index:")
print(df_indexed)
Output
Original DataFrame: ID Name Age 0 101 Alice 25 1 102 Bob 30 2 103 Charlie 35 3 104 David 40 DataFrame after setting 'ID' as index: Name Age ID 101 Alice 25 102 Bob 30 103 Charlie 35 104 David 40
In the above example, after using the set_index('ID')
method, the ID
column is now the index of the df_indexed DataFrame.
Example 2: Retain Columns While Setting Them as Index
import pandas as pd
# creating a sample DataFrame
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# setting 'ID' column as the index but retaining it in the DataFrame
df_indexed = df.set_index('ID', drop=False)
print("\nDataFrame after setting 'ID' as index but retaining it as a column:")
print(df_indexed)
Output
Original DataFrame: ID Name Age 0 101 Alice 25 1 102 Bob 30 2 103 Charlie 35 3 104 David 40 DataFrame after setting 'ID' as index but retaining it as a column: ID Name Age ID 101 101 Alice 25 102 102 Bob 30 103 103 Charlie 35 104 104 David 40
Here, we have used drop=False
inside set_index()
to retain columns while setting them as index.
So as we can see in the result, the ID
column has been set as the index of the df_indexed DataFrame, but it's also retained as a column within the DataFrame.
Example 3: Set Multiple Columns as the Index
import pandas as pd
# create a sample DataFrame
data = {
'Country': ['USA', 'USA', 'Canada', 'Canada'],
'State': ['California', 'New York', 'Ontario', 'Quebec'],
'City': ['Los Angeles', 'New York City', 'Toronto', 'Montreal'],
'Population': [3977687, 8175133, 2731571, 1704694]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# setting 'Country' and 'State' columns as the index
df_multi_indexed = df.set_index(['Country', 'State'])
print("\nDataFrame after setting 'Country' and 'State' as indices:")
print(df_multi_indexed)
Output
Original DataFrame:
Country State City Population
0 USA California Los Angeles 3977687
1 USA New York New York City 8175133
2 Canada Ontario Toronto 2731571
3 Canada Quebec ontreal 1704694
DataFrame after setting 'Country' and 'State' as indices:
City Population
Country State
USA California Los Angeles 3977687
New York New York City 8175133
Canada Ontario Toronto 2731571
Quebec Montreal 1704694
In this example, set_index(['Country', 'State'])
sets both the Country
and State
columns as the index, resulting in the multi-index DataFrame.
Example 4: Append a Column to the Existing Index
import pandas as pd
# creating a sample DataFrame
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
# initially set the 'ID' column as our index
df.set_index('ID', inplace=True)
print("DataFrame with 'ID' as index:")
print(df)
print()
# append 'City' to the existing index, creating a multi-index
df.set_index(['City'], append=True, inplace=True)
print("\nDataFrame after appending 'City' to the existing index:")
print(df)
Output
DataFrame with 'ID' as index:
Name Age City
ID
101 Alice 25 New York
102 Bob 30 Los Angeles
103 Charlie 35 Chicago
104 David 40 Houston
DataFrame after appending 'City' to the existing index:
Name Age
ID City
101 New York Alice 25
102 Los Angeles Bob 30
103 Chicago Charlie 35
104 Houston David 40
In the above example, we have the df DataFrame initially indexed by ID
.
And then we used the set_index()
method with the append=True
parameter to append the City
column to the index, creating a multi-index consisting of ID
and City
.
Here, the inplace=True
argument modifies the original DataFrame directly without creating a new one and without returning anything.
Example 5: Check for Duplicates in the New Index
import pandas as pd
# sample DataFrame
data = {
'ID': [101, 102, 103, 101], # Note the duplicate ID '101'
'Name': ['Alice', 'Bob', 'Charlie', 'Eve'],
'Age': [25, 30, 35, 28]
}
df = pd.DataFrame(data)
# attempt to set 'ID' as index and checking for duplicates
try:
df.set_index('ID', verify_integrity=True, inplace=True)
except ValueError as e:
print(e)
Output
Index has duplicate keys: Int64Index([101], dtype='int64', name='ID')
Here, since there's a duplicate in the ID
column, a ValueError
is raised indicating the presence of the duplicate key in the index.
Note: To learn more about exception handling, please visit Python Exception Handling.