Forecasting at Scale
1. facebook时间序列预测
facebook开源时间序列预测算法,该算法基于加法模型,支持非线性趋势预测,改变点(change point),周期性,季节性以及节假日等等。
It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.
时间序列预测在实际工作中非常频繁,譬如预测业务发展,制定业务目标;设定产品的kpi,预测未来的UV, PV等等;
2. 时间序列预测框架
3. 算法
加法模型
y(t)=g(t)+s(t)+h(t)+ϵt" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">y(t)=g(t)+s(t)+h(t)+ϵt其中,
g(t)" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">g(t)表示增长函数,拟合时间序列模型中非周期性变化的值;
s(t)" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">s(t)表示周或者年等季节性的周期性变化;
h(t)" role="presentation" style="-webkit-print-color-adjust: exact; display: inline; line-height: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">h(t)表示节假日或者事件,对时间序列预测值的影响;
4. 实例
# -*- coding: utf-8 -*-
# @DATE : 2017/2/25 18:18
# @Author :
# @File : fb_example1.py
import pandas as pd
import numpy as np
from fbprophet import Prophet
data_df = pd.read_csv("data/example_wp_peyton_manning.csv")
data_df["y"] = np.log(data_df["y"])
print(data_df.head())
print(data_df.tail())
# fit the model, model params
# growth = 'linear',
# changepoints = None,
# n_changepoints = 25,
# yearly_seasonality = True,
# weekly_seasonality = True,
# holidays = None,
# seasonality_prior_scale = 10.0,
# holidays_prior_scale = 10.0,
# changepoint_prior_scale = 0.05,
# mcmc_samples = 0,
# interval_width = 0.80,
# uncertainty_samples = 1000
m = Prophet()
m.fit(data_df)
# make prediction
data_future = m.make_future_dataframe(periods=30)
print(data_future.tail())
pred_res = m.predict(data_future)
print(pred_res[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
# visualization
m.plot(pred_res)
运行结果,
ds y
0 2007-12-10 9.590761
1 2007-12-11 8.519590
2 2007-12-12 8.183677
3 2007-12-13 8.072467
4 2007-12-14 7.893572
ds y
2900 2016-01-16 7.817223
2901 2016-01-17 9.273878
2902 2016-01-18 10.333775
2903 2016-01-19 9.125871
2904 2016-01-20 8.891374
STAN OPTIMIZATION COMMAND (LBFGS)
init = user
save_iterations = 1
init_alpha = 0.001
tol_obj = 1e-12
tol_grad = 1e-08
tol_param = 1e-08
tol_rel_obj = 10000
tol_rel_grad = 1e+07
history_size = 5
seed = 1691376609
initial log joint probability = -19.4685
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 7977.57 0.000941357 431.339 0.3404 0.3404 134
199 7988.7 0.000894011 356.862 0.739 0.739 241
299 7996.29 0.00359033 180.856 1 1 358
399 8000.11 0.000546236 205.358 0.09131 0.7253 481
499 8002.89 0.00024026 99.613 1 1 608
514 8003.11 5.25911e-05 135.817 7.646e-07 0.001 671 LS failed, Hessian reset
580 8003.41 3.04884e-05 92.4947 1.88e-07 0.001 798 LS failed, Hessian reset
599 8003.49 8.15685e-05 83.046 0.6885 0.6885 821
607 8003.5 2.60204e-05 67.9783 1.712e-07 0.001 874 LS failed, Hessian reset
654 8003.64 0.000118504 280.906 6.562e-07 0.001 973 LS failed, Hessian reset
699 8003.75 2.52751e-06 58.0645 0.3238 1 1029
705 8003.75 4.61033e-07 59.0008 0.2964 1 1037
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
ds
2930 2016-02-15
2931 2016-02-16
2932 2016-02-17
2933 2016-02-18
2934 2016-02-19
ds yhat yhat_lower yhat_upper
2930 2016-02-15 8.021739 7.371417 8.641458
2931 2016-02-16 7.710504 7.079853 8.334700
2932 2016-02-17 7.448298 6.849103 8.012131
2933 2016-02-18 7.370376 6.724225 8.004908
2934 2016-02-19 7.305117 6.683996 8.001754
Process finished with exit code 0
5. 参考资源
facebook prophet
https://facebookincubator.github.io/prophet/
PS:在日常工作应用中,预测成交额,销量,PV等等可以借鉴fb的时间序列技术,引入季节性因素,节假日,促销事件(譬如双11,双12等);