In Python, pandas.DataFrame.rolling()
is a method used to calculate rolling statistics over a specified window of data. It allows you to perform calculations like moving averages, sums, or other aggregations on a dataset by sliding a fixed-size window across the data.
Contents
Example
import pandas as pd
data = [1, 2, 3, 4, 5, 6]
df = pd.DataFrame(data, columns=['Numbers'])
# Calculate the rolling mean with a window size of 3
df['Rolling_Mean'] = df['Numbers'].rolling(window=3).mean()
print(df)
This will output:
Numbers Rolling_Mean
0 1 NaN
1 2 NaN
2 3 2.0
3 4 3.0
4 5 4.0
5 6 5.0
What it does
The rolling()
method creates a sliding window of a specified size (in this case, 3) and calculates a rolling mean for each window. The first two results are NaN
because there aren’t enough values to fill the window initially.
Examples
Example 1: Calculating rolling sum
df['Rolling_Sum'] = df['Numbers'].rolling(window=2).sum()
print(df)
This calculates the sum of values within each window of size 2. It will display:
Numbers Rolling_Sum
0 1 NaN
1 2 3.0
2 3 5.0
3 4 7.0
4 5 9.0
5 6 11.0
The sum is computed for every window of 2 values, starting from the second row.
Example 2: Rolling with different window sizes
df['Rolling_Max'] = df['Numbers'].rolling(window=4).max()
print(df)
This finds the maximum value in each rolling window of size 4:
Numbers Rolling_Max
0 1 NaN
1 2 NaN
2 3 NaN
3 4 4.0
4 5 5.0
5 6 6.0
The maximum value is calculated for every window of 4, starting from the fourth row.
Example 3: Applying a custom function
df['Rolling_Custom'] = df['Numbers'].rolling(window=3).apply(lambda x: x.max() - x.min())
print(df)
This example uses a custom function to find the difference between the maximum and minimum values in each rolling window of size 3:
Numbers Rolling_Custom
0 1 NaN
1 2 NaN
2 3 2.0
3 4 2.0
4 5 2.0
5 6 2.0
It calculates the range (max – min) for each window.
Example 4: Rolling with a specified minimum number of periods
df['Rolling_Mean_Min_Periods'] = df['Numbers'].rolling(window=3, min_periods=1).mean()
print(df)
Here, min_periods=1
allows the rolling calculation even if there aren’t enough values to fill the entire window:
Numbers Rolling_Mean_Min_Periods
0 1 1.0
1 2 1.5
2 3 2.0
3 4 3.0
4 5 4.0
5 6 5.0
It provides a mean even for windows that don’t have enough values, which can be useful when dealing with incomplete datasets.