The rolling()
method in Pandas is used to perform rolling window calculations on sequential data.
A rolling window is a fixed-size interval or subset of data that moves sequentially through a larger dataset.
And it is used for calculations such as averages, sums, or other statistics, with the window rolling one step at a time through the data to provide insights into trends and patterns within the dataset.
Example
import pandas as pd
# create a DataFrame with sequential data
data = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
# use rolling() to calculate the rolling maximum
window_size = 3
rolling_max = data['value'].rolling(window=window_size).max()
# display the rolling_max
print(rolling_max)
'''
Output
0 NaN
1 NaN
2 3.0
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
Name: value, dtype: float64
'''
rolling() Syntax
The syntax of the rolling()
method in Pandas is:
df.rolling(window, min_periods=1, center=False, on=None, axis=0, closed=None)
rolling() Arguments
The rolling()
method takes following arguments:
window
- size of the rolling window (sequential data)min_periods
(optional) - minimum non-null observations needed for a valid resultcenter
(optional) - use center label as result index ifTrue
, else right end label (default)on
(optional) - specifies the column to use as the rolling window anchoraxis
(optional) - specifies the axis along which the rolling window is applied. Default is 0 (along rows)closed
(optional) - specifies which side of the window interval is closed.
rolling() Return Value
The rolling()
method returns an object, which is not a final computed result but rather an intermediate object that allows us to apply various aggregation functions within the rolling window.
Example 1: Use rolling() to Calculate Rolling Minimum
import pandas as pd
# create a DataFrame with sequential data
data = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
# use rolling() to calculate the rolling minimum
window_size = 3
rolling_min = data['value'].rolling(window=window_size).min()
# display the rolling_min
print(rolling_min)
Output
0 NaN 1 NaN 2 1.0 3 2.0 4 3.0 5 4.0 6 5.0 7 6.0 8 7.0 Name: value, dtype: float64
In the above example, we have used the rolling()
method on the value
column of the data DataFrame to calculate the rolling minimum.
The window_size
is set to 3, which means it calculates the minimum value within a rolling window of size 3 as it moves through the value
column.
In the output, the first two values (at index 0 and 1) are NaN
because there are not enough data points to calculate the minimum in the beginning due to the window size of 3.
Starting from index 2, each subsequent value represents the minimum value within a rolling window of size 3.
For example, at index 2, the rolling window includes [1, 2, 3], and the minimum is 1.0. Similarly, at index 3, the rolling window includes [2, 3, 4], and the minimum is 2.0, and so on.
Note: After calling rolling()
, we can apply any aggregation functions to compute calculations within the rolling window, such as mean()
, sum()
, min()
, max()
, etc.
Example 2: Handle Missing Data in Rolling Calculations
import pandas as pd
# create a DataFrame with missing values
data = pd.DataFrame({'value': [1, None, 3, 4, 5, None, 7, 8, 9]})
# calculate the rolling mean with
# window size of 2 and min_periods set to 2
window_size = 2
rolling_mean = data['value'].rolling(window=window_size, min_periods=2).mean()
# display the rolling_mean
print(rolling_mean)
Output
0 NaN 1 NaN 2 NaN 3 3.5 4 4.5 5 NaN 6 NaN 7 7.5 8 8.5 Name: value, dtype: float64
In this example, the rolling()
method calculates the mean using a specified window size.
We've set window=2
, which means it calculates the mean of every 2 consecutive values.
The parameter min_periods
is set to 2, which means that at least 2 non-NaN values are needed to compute the mean. If there are less than 2 non-NaN values within a window, the result will be NaN
.
Example 3: Centered Rolling Window Calculations in Pandas
import pandas as pd
# create a DataFrame with time-based data
data = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})
# calculate a centered rolling sum with a window size of 3
window_size = 3
centered_rolling_sum = data['value'].rolling(window=window_size, center=True).sum()
# display the result
print(centered_rolling_sum)
Output
0 NaN 1 6.0 2 9.0 3 12.0 4 15.0 5 18.0 6 21.0 7 24.0 8 NaN Name: value, dtype: float64
Here, we have used the rolling()
method to apply a moving window calculation on the value
column of the df DataFrame.
We've set a window size of 3 and specified the center=True
parameter, which means each calculated value is centered on its respective window.
Due to the centered approach, the first and last entries don't have both a previous and next value. Hence, their rolling sum is represented as NaN
.
Example 4: Use on Argument in rolling() For Date-based Calculations
import pandas as pd
# sample DataFrame
df = pd.DataFrame({
'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
'value': [10, 20, 30, 40, 50]
})
# convert 'date' column to datetime type
df['date'] = pd.to_datetime(df['date'])
# compute rolling sum based on the 'date' column with a window size of 3 days
df['rolling_sum'] = df.rolling(window='3D', on='date')['value'].sum()
print(df)
Output
date value rolling_sum
0 2023-01-01 10 10.0
1 2023-01-02 20 30.0
2 2023-01-03 30 60.0
3 2023-01-04 40 90.0
4 2023-01-05 50 120.0
In the above example, we've used the rolling()
method with the window='3D'
argument, specifying a rolling window of 3 days.
By setting on='date'
, we ensure that the rolling calculation is based on the dates in the date
column rather than the default index.
The result is stored in a new column called rolling_sum, which contains the cumulative sum of value
for every 3-day period.
Example 5: Applying Column-Wise Rolling Operations
import pandas as pd
# sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [9, 8, 7, 6, 5]
})
# apply rolling sum with window size of 2, along columns
df_rolling = df.rolling(window=2, axis=1).sum()
print(df_rolling)
Output
A B C 0 NaN 6.0 14.0 1 NaN 8.0 14.0 2 NaN 10.0 14.0 3 NaN 12.0 14.0 4 NaN 14.0 14.0
In this example, we applied the rolling window column-wise by specifying axis=1
.
For each row:
- The value in the
A
column isNaN
because there's no preceding column to form a window of size 2. - The value in the
B
column is the sum of columnsA
andB
. - The value in the
C
column is the sum of columnsB
andC
.
Example 6: Window Boundaries Using closed Parameter
The possible values for the closed
parameter are:
'right'
(default) - close the right side of the window.'left'
- close the left side of the window.'both'
- close both sides of the window.'neither'
- do not close either side of the window.
Let's look at an example.
import pandas as pd
# sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8]
})
# rolling with different closed values
print("Closed on right (default):")
print(df.rolling(window=2, closed='left').sum())
print("\nClosed on left:")
print(df.rolling(window=2, closed='left').sum())
print("\nClosed on both:")
print(df.rolling(window=2, closed='both').sum())
print("\nClosed on neither:")
print(df.rolling(window=2, closed='neither').sum())
Output
Closed on right (default):
A B
0 NaN NaN
1 3.0 11.0
2 5.0 13.0
3 7.0 15.0
Closed on left:
A B
0 NaN NaN
1 NaN NaN
2 3.0 11.0
3 5.0 13.0
Closed on both:
A B
0 NaN NaN
1 3.0 11.0
2 6.0 18.0
3 9.0 21.0
Closed on neither:
A B
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
Here,
closed='right'
- sums are for the current row and the row just before itclosed='left'
- sums are for the row just before the current row and the one before thatclosed='both'
- sums three rows (current, previous, and next), unexpectedly acting like a window size of 3 instead of the specified 2closed='right'
- all values areNaN
since neither the current row nor the previous one is included in the sum.