The pivot()
method in Pandas is used to reshape data. It's particularly handy when we have data in a long format and want to pivot it to a wide format based on column values.
Example
import pandas as pd
# create a sample dataset
data = {
'Student': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
'Subject': ['Math', 'History', 'Math', 'History', 'Math', 'History'],
'Score': [90, 85, 78, 88, 92, 79]
}
# convert data dictionary to DataFrame
df = pd.DataFrame(data)
# use pivot() to reshape the data
pivot_df = df.pivot(index='Student', columns='Subject', values='Score')
# display the reshaped data
print(pivot_df)
'''
Output
Subject History Math
Student
Alice 85 90
Bob 88 78
Charlie 79 92
'''
pivot() Syntax
The syntax of the pivot()
method in Pandas is:
df.pivot(columns, index=None, values=None)
pivot() Arguments
The pivot()
function takes following arguments:
columns
- column(s) to use to make new dataframe's columnsindex
(optional) - column to use as the new frame's index. IfNone
, use the existing indexvalues
(optional) - column(s) to use for populating new dataframe's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.
pivot() Return Value
The pivot()
method returns a reshaped DataFrame organized by the given index/column values.
Example1: Using pivot() to Reshape the DataFrame
import pandas as pd
# sample DataFrame
data = {
'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
'Temperature': [65, 66, 78, 79],
'Humidity': [56, 57, 54, 52]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# pivot with just the 'City' column
df_pivot = df.pivot(columns='City')
print("\nPivoted DataFrame:")
print(df_pivot)
Output
Original DataFrame:
Date City Temperature Humidity
0 2023-10-01 New York 65 56
1 2023-10-02 New York 66 57
2 2023-10-01 Los Angeles 78 54
3 2023-10-02 Los Angeles 79 52
Pivoted DataFrame:
Date Temperature Humidity
City Los Angeles New York Los Angeles New York Los Angeles New York
0 NaN 2023-10-01 NaN 65.0 NaN 56.0
1 NaN 2023-10-02 NaN 66.0 NaN 57.0
2 2023-10-01 NaN 78.0 NaN 54.0 NaN
3 2023-10-02 NaN 79.0 NaN 52.0 NaN
>
Here, the code above produces a multi-level column DataFrame where the top level of columns represents the unique cities and the second level represents the original columns Temperature
and Humidity
.
Each row then corresponds to a specific date from the original Date
column.
The NaN
values in the pivoted DataFrame indicates there's no available data for a particular combination.
Example2: Use pivot() with the index argument
import pandas as pd
# sample DataFrame
data = {
'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
'Temperature': [65, 66, 78, 79],
'Humidity': [56, 57, 54, 52]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# pivot using 'Date' as the index
df_pivot = df.pivot(index='Date', columns='City')
print("\nPivoted DataFrame:")
print(df_pivot)
Output
Original DataFrame:
Date City Temperature Humidity
0 2023-10-01 New York 65 56
1 2023-10-02 New York 66 57
2 2023-10-01 Los Angeles 78 54
3 2023-10-02 Los Angeles 79 52
Pivoted DataFrame:
Temperature Humidity
City Los Angeles New York Los Angeles New York
Date
2023-10-01 78 65 54 56
2023-10-02 79 66 52 57
In this example, the Date
column values become the index of the pivoted DataFrame.
The City
values become new columns, yielding a multi-level DataFrame with cities as the top columns and Temperature
and Humidity
as sub-columns.
Example3: Using pivot() in Pandas with the values Argument
import pandas as pd
# sample DataFrame
data = {
'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
'Temperature': [65, 66, 78, 79],
'Humidity': [56, 57, 54, 52]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# pivot with Date as the index, City as the columns,
# and Temperature as the values
df_pivot = df.pivot(index='Date', columns='City', values='Temperature')
print("\nPivoted DataFrame:")
print(df_pivot)
Output
Original DataFrame:
Date City Temperature Humidity
0 2023-10-01 New York 65 56
1 2023-10-02 New York 66 57
2 2023-10-01 Los Angeles 78 54
3 2023-10-02 Los Angeles 79 52
Pivoted DataFrame:
City Los Angeles New York
Date
2023-10-01 78 65
2023-10-02 79 66
In this example, the pivot()
method is called on df using the syntax: df.pivot(index='Date', columns='City', values='Temperature')
.
Here,
index='Date'
- unique values inDate
become rows in the new DataFramecolumns='City'
- unique values inCity
set the columns (New York
andLos Angeles
)values='Temperature'
-Temperature
values from original data populate the cells in the pivoted table
And finally, a grid is formed where rows are dates, columns are cities, and cells show temperatures, allowing easy visual comparison.