Pandas pivot()

The pivot() method in Pandas is used to reshape data. It's particularly handy when we have data in a long format and want to pivot it to a wide format based on column values.

Example

import pandas as pd

# create a sample dataset
data = {
    'Student': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
    'Subject': ['Math', 'History', 'Math', 'History', 'Math', 'History'],
    'Score': [90, 85, 78, 88, 92, 79]
}

# convert data dictionary to DataFrame
df = pd.DataFrame(data)

# use pivot() to reshape the data pivot_df = df.pivot(index='Student', columns='Subject', values='Score')
# display the reshaped data print(pivot_df) ''' Output Subject History Math Student Alice 85 90 Bob 88 78 Charlie 79 92 '''

pivot() Syntax

The syntax of the pivot() method in Pandas is:

df.pivot(columns, index=None, values=None)

pivot() Arguments

The pivot() function takes following arguments:

  • columns - column(s) to use to make new dataframe's columns
  • index (optional) - column to use as the new frame's index. If None, use the existing index
  • values (optional) - column(s) to use for populating new dataframe's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

pivot() Return Value

The pivot() method returns a reshaped DataFrame organized by the given index/column values.


Example1: Using pivot() to Reshape the DataFrame

import pandas as pd

# sample DataFrame
data = {
    'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
    'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
    'Temperature': [65, 66, 78, 79],
    'Humidity': [56, 57, 54, 52]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# pivot with just the 'City' column
df_pivot = df.pivot(columns='City')

print("\nPivoted DataFrame:")
print(df_pivot)

Output

Original DataFrame:
Date         City  Temperature  Humidity
0  2023-10-01     New York           65        56
1  2023-10-02     New York           66        57
2  2023-10-01  Los Angeles           78        54
3  2023-10-02  Los Angeles           79        52

Pivoted DataFrame:
Date             Temperature             Humidity         
City Los Angeles    New York Los Angeles New York Los Angeles New York
0            NaN  2023-10-01         NaN     65.0         NaN     56.0
1            NaN  2023-10-02         NaN     66.0         NaN     57.0
2     2023-10-01         NaN        78.0      NaN        54.0      NaN
3     2023-10-02         NaN        79.0      NaN        52.0      NaN
> 

Here, the code above produces a multi-level column DataFrame where the top level of columns represents the unique cities and the second level represents the original columns Temperature and Humidity.

Each row then corresponds to a specific date from the original Date column.

The NaN values in the pivoted DataFrame indicates there's no available data for a particular combination.


Example2: Use pivot() with the index argument

import pandas as pd

# sample DataFrame
data = {
    'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
    'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
    'Temperature': [65, 66, 78, 79],
    'Humidity': [56, 57, 54, 52]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# pivot using 'Date' as the index df_pivot = df.pivot(index='Date', columns='City')
print("\nPivoted DataFrame:") print(df_pivot)

Output

Original DataFrame:
   Date           City         Temperature   Humidity
0  2023-10-01     New York           65        56
1  2023-10-02     New York           66        57
2  2023-10-01  Los Angeles           78        54
3  2023-10-02  Los Angeles           79        52

Pivoted DataFrame:
                  Temperature             Humidity         
City       Los Angeles New York Los Angeles New York
Date                                                
2023-10-01          78       65          54       56
2023-10-02          79       66          52       57

In this example, the Date column values become the index of the pivoted DataFrame.

The City values become new columns, yielding a multi-level DataFrame with cities as the top columns and Temperature and Humidity as sub-columns.


Example3: Using pivot() in Pandas with the values Argument

import pandas as pd

# sample DataFrame
data = {
    'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
    'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
    'Temperature': [65, 66, 78, 79],
    'Humidity': [56, 57, 54, 52]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# pivot with Date as the index, City as the columns, # and Temperature as the values df_pivot = df.pivot(index='Date', columns='City', values='Temperature')
print("\nPivoted DataFrame:") print(df_pivot)

Output

Original DataFrame:
     Date         City         Temperature  Humidity
0  2023-10-01     New York           65        56
1  2023-10-02     New York           66        57
2  2023-10-01  Los Angeles           78        54
3  2023-10-02  Los Angeles           79        52

Pivoted DataFrame:
City        Los Angeles  New York
Date                             
2023-10-01           78        65
2023-10-02           79        66

In this example, the pivot() method is called on df using the syntax: df.pivot(index='Date', columns='City', values='Temperature').

Here,

  • index='Date' - unique values in Date become rows in the new DataFrame
  • columns='City' - unique values in City set the columns (New York and Los Angeles)
  • values='Temperature' - Temperature values from original data populate the cells in the pivoted table

And finally, a grid is formed where rows are dates, columns are cities, and cells show temperatures, allowing easy visual comparison.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO
  • Real-World Projects
  • On-Demand Learning
  • AI Mentor
  • Builder Community