Pandas Pivot

The pivot() function in Pandas reshapes data based on column values. It takes simple column-wise data as input, and groups the entries into a two-dimensional table.

Working of pivot operation in Pandas
Pivot Operation in Pandas

Let's look at an example.

import pandas as pd

# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, 75, 30, 77]}
df = pd.DataFrame(data)

print("Original DataFrame\n", df)
print()

# pivot the dataframe pivot_df = df.pivot(index='Date', columns='City', values='Temperature')
print("Reshaped DataFrame\n", pivot_df)

Output

Original DataFrame
         Date         City  Temperature
0  2023-01-01     New York           32
1  2023-01-01  Los Angeles           75
2  2023-01-02     New York           30
3  2023-01-02  Los Angeles           77

Reshaped DataFrame
City        Los Angeles  New York
Date                             
2023-01-01        75           32
2023-01-02        77           30

In this example, we used pivot() to reshape the DataFrame df. The Date column is set as index, City as columns and Temperature as values.

Notice the original and reshaped DataFrame in the output section. The reshaped DataFrame is a multidimensional table that shows the temperature based on the city and the date.

Thus the pivot() operation reshapes the data to make it clearer for further analysis.


pivot() syntax

The syntax of pivot() in Pandas is:

df.pivot(index=None, columns=None, values=None)

Here,

  • index: the column to use as row labels
  • columns: the column that will be reshaped as columns
  • values: the column(s) to use for the new DataFrame's values

Example: pivot() for Multiple Values

If we omit the values argument in pivot(), it selects all the remaining columns (besides the ones specified index and columns) as values for the pivot table.

Let's see an example.

import pandas as pd

# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, 75, 30, 77],
        'Humidity': [80, 10, 85, 5]}

df = pd.DataFrame(data)

print('Original DataFrame')
print(df)
print()

# pivot the dataframe pivot_df = df.pivot(index='Date', columns='City')
print('Reshaped DataFrame') print(pivot_df)

Output

Original DataFrame
         Date         City  Temperature  Humidity
0  2023-01-01     New York           32        80
1  2023-01-01  Los Angeles           75        10
2  2023-01-02     New York           30        85
3  2023-01-02  Los Angeles           77         5

Reshaped DataFrame
           Temperature             Humidity         
City       Los Angeles New York Los Angeles New York
Date                                                
2023-01-01          75       32          10       80
2023-01-02          77       30           5       85

In this example, we created a pivot table for multiple values i.e. Temperature and Humidity.


pivot() vs pivot_table()

The pivot() and pivot_table() functions perform similar operations but with few key differences.

Basis pivot() pivot_table()
Aggregation Does not allow aggregation of data. Allows aggregation (sum, mean, count, etc.).
Duplicate Index Cannot handle duplicate index values. Can handle duplicate index values.
MultiIndex Only accepts a single-level index. Accepts multi-level index for complex data.