The pivot() method in Pandas is used to reshape data. It's particularly handy when we have data in a long format and want to pivot it to a wide format based on column values.
Example
import pandas as pd
# create a sample dataset
data = {
'Student': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
'Subject': ['Math', 'History', 'Math', 'History', 'Math', 'History'],
'Score': [90, 85, 78, 88, 92, 79]
}
# convert data dictionary to DataFrame
df = pd.DataFrame(data)
# use pivot() to reshape the data
pivot_df = df.pivot(index='Student', columns='Subject', values='Score')
# display the reshaped data
print(pivot_df)
'''
Output
Subject History Math
Student
Alice 85 90
Bob 88 78
Charlie 79 92
'''
pivot() Syntax
The syntax of the pivot() method in Pandas is:
df.pivot(columns, index=None, values=None)
pivot() Arguments
The pivot() function takes following arguments:
columns- column(s) to use to make new dataframe's columnsindex(optional) - column to use as the new frame's index. IfNone, use the existing indexvalues(optional) - column(s) to use for populating new dataframe's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.
pivot() Return Value
The pivot() method returns a reshaped DataFrame organized by the given index/column values.
Example1: Using pivot() to Reshape the DataFrame
import pandas as pd
# sample DataFrame
data = {
'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
'Temperature': [65, 66, 78, 79],
'Humidity': [56, 57, 54, 52]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# pivot with just the 'City' column
df_pivot = df.pivot(columns='City')
print("\nPivoted DataFrame:")
print(df_pivot)
Output
Original DataFrame:
Date City Temperature Humidity
0 2023-10-01 New York 65 56
1 2023-10-02 New York 66 57
2 2023-10-01 Los Angeles 78 54
3 2023-10-02 Los Angeles 79 52
Pivoted DataFrame:
Date Temperature Humidity
City Los Angeles New York Los Angeles New York Los Angeles New York
0 NaN 2023-10-01 NaN 65.0 NaN 56.0
1 NaN 2023-10-02 NaN 66.0 NaN 57.0
2 2023-10-01 NaN 78.0 NaN 54.0 NaN
3 2023-10-02 NaN 79.0 NaN 52.0 NaN
>
Here, the code above produces a multi-level column DataFrame where the top level of columns represents the unique cities and the second level represents the original columns Temperature and Humidity.
Each row then corresponds to a specific date from the original Date column.
The NaN values in the pivoted DataFrame indicates there's no available data for a particular combination.
Example2: Use pivot() with the index argument
import pandas as pd
# sample DataFrame
data = {
'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
'Temperature': [65, 66, 78, 79],
'Humidity': [56, 57, 54, 52]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# pivot using 'Date' as the index
df_pivot = df.pivot(index='Date', columns='City')
print("\nPivoted DataFrame:")
print(df_pivot)
Output
Original DataFrame:
Date City Temperature Humidity
0 2023-10-01 New York 65 56
1 2023-10-02 New York 66 57
2 2023-10-01 Los Angeles 78 54
3 2023-10-02 Los Angeles 79 52
Pivoted DataFrame:
Temperature Humidity
City Los Angeles New York Los Angeles New York
Date
2023-10-01 78 65 54 56
2023-10-02 79 66 52 57
In this example, the Date column values become the index of the pivoted DataFrame.
The City values become new columns, yielding a multi-level DataFrame with cities as the top columns and Temperature and Humidity as sub-columns.
Example3: Using pivot() in Pandas with the values Argument
import pandas as pd
# sample DataFrame
data = {
'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
'Temperature': [65, 66, 78, 79],
'Humidity': [56, 57, 54, 52]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# pivot with Date as the index, City as the columns,
# and Temperature as the values
df_pivot = df.pivot(index='Date', columns='City', values='Temperature')
print("\nPivoted DataFrame:")
print(df_pivot)
Output
Original DataFrame:
Date City Temperature Humidity
0 2023-10-01 New York 65 56
1 2023-10-02 New York 66 57
2 2023-10-01 Los Angeles 78 54
3 2023-10-02 Los Angeles 79 52
Pivoted DataFrame:
City Los Angeles New York
Date
2023-10-01 78 65
2023-10-02 79 66
In this example, the pivot() method is called on df using the syntax: df.pivot(index='Date', columns='City', values='Temperature').
Here,
index='Date'- unique values inDatebecome rows in the new DataFramecolumns='City'- unique values inCityset the columns (New YorkandLos Angeles)values='Temperature'-Temperaturevalues from original data populate the cells in the pivoted table
And finally, a grid is formed where rows are dates, columns are cities, and cells show temperatures, allowing easy visual comparison.