计算滚动平均
滚动平均是指时间序列中之前特定数量数据的平均值。
在pandas
中有相应的库函数来实现计算任务。
具体语法:df['column_name'].rolling(rolling_window).mean()
实例:
import numpy as np
import pandas as pd
#make this example reproducible
np.random.seed(0)
#create dataset
period = np.arange(1, 101, 1) # 1-100的时间区间
leads = np.random.uniform(1, 20, 100)
sales = 60 + 2*period + np.random.normal(loc=0, scale=.5*period, size=100)
df = pd.DataFrame({'period': period, 'leads': leads, 'sales': sales})
#view first 10 rows
df.head(10)
结果:
period leads sales
0 1 11.427457 61.417425
1 2 14.588598 64.900826
2 3 12.452504 66.698494
3 4 11.352780 64.927513
4 5 9.049441 73.720630
5 6 13.271988 77.687668
6 7 9.314157 78.125728
7 8 17.943687 75.280301
8 9 19.309592 73.181613
9 10 8.285389 85.272259
使用pandas
语法来完成前5
区间的滚动平均计算。
#find rolling mean of previous 5 sales periods
df['rolling_sales_5'] = df['sales'].rolling(5).mean()
#view first 10 rows
df.head(10)
结果显示:
period leads sales rolling_sales_5
0 1 11.427457 61.417425 NaN
1 2 14.588598 64.900826 NaN
2 3 12.452504 66.698494 NaN
3 4 11.352780 64.927513 NaN
4 5 9.049441 73.720630 66.332978
5 6 13.271988 77.687668 69.587026
6 7 9.314157 78.125728 72.232007
7 8 17.943687 75.280301 73.948368
8 9 19.309592 73.181613 75.599188
9 10 8.285389 85.272259 77.909514
使用相似的语法可以计算其他多列的滚动平均:
#find rolling mean of previous 5 leads periods
df['rolling_leads_5'] = df['leads'].rolling(5).mean()
#find rolling mean of previous 5 leads periods
df['rolling_sales_5'] = df['sales'].rolling(5).mean()
#view first 10 rows
df.head(10)
结果显示:
period leads sales rolling_sales_5 rolling_leads_5
0 1 11.427457 61.417425 NaN NaN
1 2 14.588598 64.900826 NaN NaN
2 3 12.452504 66.698494 NaN NaN
3 4 11.352780 64.927513 NaN NaN
4 5 9.049441 73.720630 66.332978 11.774156
5 6 13.271988 77.687668 69.587026 12.143062
6 7 9.314157 78.125728 72.232007 11.088174
7 8 17.943687 75.280301 73.948368 12.186411
8 9 19.309592 73.181613 75.599188 13.777773
9 10 8.285389 85.272259 77.909514 13.624963
将不同滚动窗口的滚动平均效果进行对比:
import matplotlib.pyplot as plt
plt.plot(df['rolling_sales_5'], label='Rolling Mean Window=5')
plt.plot(df['rolling_sales_10'], label='Rolling Mean Window=10')
plt.plot(df['sales'], label='Raw Data')
plt.legend()
plt.ylabel('Sales')
plt.xlabel('Period')
plt.show()
结果显示:
结论:
滚动平均是一种数据光滑化的操作,窗口越大,越光滑,越接近趋势线。