第 20 章 时序分析

David R. Brillinger 的 Time Series: Data Analysis and Theory (1975) (Brillinger 2001) 从时间序列分析的综述上开始看 https://www.stat.berkeley.edu/~brill/Papers/encysbs.pdf - 参考 Applied Time Series Analysis 课程 https://newonlinecourses.science.psu.edu/stat510/ - 参考 Time Series Analysis and Its Applications With R Examples - 4th Edition https://www.stat.pitt.edu/stoffer/tsa4/

从时间序列中寻找规律,这样才是真的数据建模,从数据到模型,而不是相反 Finding Patterns in Time Series

可以从雅虎财经获取数据 https://finance.yahoo.com/

从 ARIMA 过渡到异方差,非高斯分布 https://mason.gmu.edu/~jgentle/talks/CompFin_Tutorial.pdf 金融时间序列的模式识别和统计学习

ARCH or GARCH 的综述 http://public.econ.duke.edu/~boller/Papers/glossary_arch.pdf

  • diff 差分
  • filter 时间序列线性过滤
  • fft 快速离散傅里叶变换

时序数据对象 时间日期数据操作 处理时序数据的工具,时序图、相关图、平稳性检验,相关检验,之后才是模型

20.1 时序数据

## [1] "ts"
## [1] "numeric"
##  Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148..
start(AirPassengers)
## [1] 1949    1
end(AirPassengers)
## [1] 1960   12
time(AirPassengers)
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 1949 1949.000 1949.083 1949.167 1949.250 1949.333 1949.417 1949.500
## 1950 1950.000 1950.083 1950.167 1950.250 1950.333 1950.417 1950.500
## 1951 1951.000 1951.083 1951.167 1951.250 1951.333 1951.417 1951.500
## 1952 1952.000 1952.083 1952.167 1952.250 1952.333 1952.417 1952.500
## 1953 1953.000 1953.083 1953.167 1953.250 1953.333 1953.417 1953.500
## 1954 1954.000 1954.083 1954.167 1954.250 1954.333 1954.417 1954.500
## 1955 1955.000 1955.083 1955.167 1955.250 1955.333 1955.417 1955.500
## 1956 1956.000 1956.083 1956.167 1956.250 1956.333 1956.417 1956.500
## 1957 1957.000 1957.083 1957.167 1957.250 1957.333 1957.417 1957.500
## 1958 1958.000 1958.083 1958.167 1958.250 1958.333 1958.417 1958.500
## 1959 1959.000 1959.083 1959.167 1959.250 1959.333 1959.417 1959.500
## 1960 1960.000 1960.083 1960.167 1960.250 1960.333 1960.417 1960.500
##           Aug      Sep      Oct      Nov      Dec
## 1949 1949.583 1949.667 1949.750 1949.833 1949.917
## 1950 1950.583 1950.667 1950.750 1950.833 1950.917
## 1951 1951.583 1951.667 1951.750 1951.833 1951.917
## 1952 1952.583 1952.667 1952.750 1952.833 1952.917
## 1953 1953.583 1953.667 1953.750 1953.833 1953.917
## 1954 1954.583 1954.667 1954.750 1954.833 1954.917
## 1955 1955.583 1955.667 1955.750 1955.833 1955.917
## 1956 1956.583 1956.667 1956.750 1956.833 1956.917
## 1957 1957.583 1957.667 1957.750 1957.833 1957.917
## 1958 1958.583 1958.667 1958.750 1958.833 1958.917
## 1959 1959.583 1959.667 1959.750 1959.833 1959.917
## 1960 1960.583 1960.667 1960.750 1960.833 1960.917
tsp(AirPassengers)
## [1] 1949.000 1960.917   12.000

20.2 时序图

library(highcharter)
plot(nhtemp, main = "Mean annual temperature in New Haven, CT (deg. F)")

# Average Yearly Temperatures in New Haven 纽黑文
highchart() %>%
  hc_xAxis(type = "datetime") %>%
  hc_add_series(data = nhtemp,  name = "nhtemp")

20.3 时序检验

参数的计算公式,实现的 R 代码

  • Applies linear filtering to a univariate time series or to each series separately of a multivariate time series. 过滤

一元时间序列的线性过滤,或者对多元时间序列的单个序列分别做线性过滤

\[y[i] = x[i] + f[1]*y[i-1] +\ldots+ f[p]*y[i-p]\]

\[ y[i] = f[1]*x[i+o] + \ldots + f[p]*x[i+o-(p-1)] \]

其中 \(o\) 代表 offset

介绍 FTT 算法细节

不同的方法对时间序列平滑的影响 FTT 快速傅里叶变换算法

usage(stats::filter)
## filter(x, filter, method = c("convolution", "recursive"), sides = 2L,
##   circular = FALSE, init = NULL)

20.4 指数平滑

20.5 Holt-Winters

可加 Holt-Winters (Winters 1960; Holt 2004) 预测函数,周期长度为 p

\(\hat{Y}[t+h] = a[t] + h * b[t] + s[t - p + 1 + (h - 1) \mod p]\)

其中 \(a[t], b[t], s[t]\) 由以下决定

\[\begin{align} a[t] &= \alpha (Y[t] - s[t-p]) + (1-\alpha) (a[t-1] + b[t-1]) \\ b[t] &= \beta (a[t] - a[t-1]) + (1-\beta) b[t-1] \\ s[t] &= \gamma (Y[t] - a[t]) + (1-\gamma) s[t-p] \end{align}\]

可乘 Holt-Winters

\[ \hat{Y}[t+h] = (a[t] + h * b[t]) * s[t - p + 1 + (h - 1) \mod p] \]

其中 \(a[t], b[t], s[t]\) 由如下决定

\[\begin{align} a[t] &= \alpha (Y[t] / s[t-p]) + (1-\alpha) (a[t-1] + b[t-1]) \\ b[t] &= \beta (a[t] - a[t-1]) + (1-\beta) b[t-1] \\ s[t] &= \gamma (Y[t] / a[t]) + (1-\gamma) s[t-p] \end{align}\]

  • HoltWinters 用 Shiny App 的形式展示 \(\alpha, \beta, \gamma\) 三个参数对模型预测的影响,参数的确定通过最小化预测均方误差
## Seasonal Holt-Winters
(m <- HoltWinters(co2))
plot(m)
plot(fitted(m))

p <- predict(m, 50, prediction.interval = TRUE)
plot(m, p)

(m <- HoltWinters(AirPassengers, seasonal = "mult"))
plot(m)

## 指数平滑 Exponential Smoothing
m2 <- HoltWinters(x, gamma = FALSE, beta = FALSE)
lines(fitted(m2)[,1], col = 3)

20.6 1749-2013 年太阳黑子数据

再从官网拿到最近的数据

plot(sunspot.month, xlab = "Year", ylab = "Monthly sunspot numbers",
     main = "Monthly mean relative sunspot numbers from 1749 to 2013")

library(ggfortify)
autoplot(sunspot.month,
  main = "Monthly mean relative sunspot numbers from 1749 to 2013",
  xlab = "Year", ylab = "Monthly sunspot numbers"
) +
  theme_minimal()
时序图:太阳黑子月均数量时序图:太阳黑子月均数量

图 20.1: 时序图:太阳黑子月均数量

autoplot(sunspots)

autoplot(sunspot.year, xlab = "Year", ylab = "Yearly Sunspot Data, 1700-1988") +
  theme_minimal()
太阳黑子数量年平均时序图

图 20.2: 太阳黑子数量年平均时序图

library(dygraphs)
dygraph(sunspot.month)
hw <- HoltWinters(sunspot.month)
predicted <- predict(hw, n.ahead = 72, prediction.interval = TRUE)

dygraph(predicted, main = "Predicted sunspot numbers") %>%
  dyAxis("x", drawGrid = FALSE) %>%
  dySeries(c("lwr", "fit", "upr"), label = "sunspot") %>%
  dyOptions(colors = hcl.colors(3))
library(highcharter)
highchart() %>% 
  hc_xAxis(type = "datetime") %>% 
  hc_add_series(data = sunspot.month) %>% 
  hc_add_series(data = sunspots)

20.7 1821-1934 年加拿大山猫陷阱数量

library(ggfortify)
autoplot(lynx)

highchart() %>% 
  hc_xAxis(type = "datetime") %>% 
  hc_add_series(data = lynx,  name = "lynx")

20.8 1991-1998 年欧洲主要股票市场日闭市价格指数

matplot(time(EuStockMarkets), EuStockMarkets,
  main = "",
  xlab = "Date", ylab = "closing prices",
  pch = 17, type = "l", col = 1:4
)
legend("topleft", colnames(EuStockMarkets), pch = 17, lty = 1, col = 1:4)
1991-1998年间欧洲主要股票市场日闭市价格指数图 
 德国 DAX (Ibis), Switzerland SMI, 法国 CAC 和 英国 FTSE

图 20.3: 1991-1998年间欧洲主要股票市场日闭市价格指数图 德国 DAX (Ibis), Switzerland SMI, 法国 CAC 和 英国 FTSE

# 考虑收集加入最新的数据 1991~1998年的数据
plot(EuStockMarkets, plot.type = "single", col = hcl.colors(4))
legend("topleft", colnames(EuStockMarkets),
  col = hcl.colors(4), text.col = hcl.colors(4), lty = 1,
  box.col = NA, inset = 0.05
)

20.9 自回归模型

ar()

20.10 移动平均模型

arima()

20.11 自回归移动平均模型

arima() ARIMA

20.12 自回归条件异方差模型

自回归条件异方差模型 ARCH

20.13 广义自回归条件异方差模型

广义自回归条件异方差模型 (Generalized Autoregressive Conditional Heteroskedasticity,简称 GARCH )

prophet 基于可加模型的时间序列预测

AnomalyDetection 时间序列数据中的异常值检测

20.14 其它特征的时间序列

时间序列差分平稳性

highchart() %>% 
  hc_xAxis(type = "datetime") %>% 
  hc_add_series(data = JohnsonJohnson,  name = "JohnsonJohnson")

时间序列周期性

highchart() %>% 
  hc_xAxis(type = "datetime") %>% 
  hc_add_series(data = AirPassengers, color = "Orange", name = "AirPassengers")

周期性

# Average Monthly Temperatures at Nottingham, 1920-1939
highchart() %>%
  hc_xAxis(type = "datetime") %>%
  hc_add_series(data = nottem,  name = "nottem")

20.15 51Talk 公司股价走势

51talk 于 2016年6月10日在美国纽交所上市,股票代码 COE, 2020年1月22日,武汉封城,受新冠肺炎病毒影响,政府停课不停学的号召,线下教育纷纷转线上,线上教育的春天来临,股价开始回升到发行价的水平,在公司将资源转变为能力后,预期公司股价继续翻倍,回到理性的水平。

coe <- quantmod::getSymbols("COE", auto.assign = FALSE, src = "yahoo", from = '2016-06-10')
hchart(coe)

20.16 运行环境

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 8 (Core)
## 
## Matrix products: default
## BLAS:   /opt/R/R-3.6.3/lib64/R/lib/libRblas.so
## LAPACK: /opt/R/R-3.6.3/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods  
## [7] base     
## 
## other attached packages:
## [1] dygraphs_1.1.1.6  highcharter_0.7.0 ggfortify_0.4.10 
## [4] gganimate_1.0.5   ggplot2_3.3.2     magrittr_1.5     
## [7] formatR_1.7      
## 
## loaded via a namespace (and not attached):
##  [1] progress_1.2.2    zoo_1.8-8         tidyselect_1.1.0 
##  [4] xfun_0.15         purrr_0.3.4       lattice_0.20-41  
##  [7] colorspace_1.4-1  vctrs_0.3.1       generics_0.0.2   
## [10] htmltools_0.5.0   yaml_2.2.1        rlang_0.4.6      
## [13] pillar_1.4.4      glue_1.4.1        withr_2.2.0      
## [16] tweenr_1.0.1      TTR_0.23-6        lifecycle_0.2.0  
## [19] quantmod_0.4.17   stringr_1.4.0     munsell_0.5.0    
## [22] gtable_0.3.0      htmlwidgets_1.5.1 evaluate_0.14    
## [25] labeling_0.3      knitr_1.29        curl_4.3         
## [28] gifski_0.8.6      highr_0.8         broom_0.5.6      
## [31] xts_0.12-0        Rcpp_1.0.4.12     scales_1.1.1     
## [34] backports_1.1.8   jsonlite_1.6.1    farver_2.0.3     
## [37] gridExtra_2.3     hms_0.5.3         digest_0.6.25    
## [40] stringi_1.4.6     rlist_0.4.6.1     bookdown_0.20    
## [43] dplyr_1.0.0       grid_3.6.3        tools_3.6.3      
## [46] tibble_3.0.1      whisker_0.4       crayon_1.3.4     
## [49] tidyr_1.1.0       pkgconfig_2.0.3   ellipsis_0.3.1   
## [52] data.table_1.12.8 prettyunits_1.1.1 lubridate_1.7.9  
## [55] assertthat_0.2.1  rmarkdown_2.3     R6_2.4.1         
## [58] igraph_1.2.5      nlme_3.1-148      compiler_3.6.3

参考文献

Brillinger, David R. 2001. Time Series: Data Analysis and Theory. Philadelphia, PA, USA: Society for Industrial; Applied Mathematics.

Holt, Charles C. 2004. “Forecasting Seasonals and Trends by Exponentially Weighted Moving Averages.” International Journal of Forecasting 20 (1): 5–10. https://doi.org/10.1016/j.ijforecast.2003.09.015.

Winters, Peter R. 1960. “Forecasting Sales by Exponentially Weighted Moving Averages.” Management Science 6 (3): 324–42. https://doi.org/10.1287/mnsc.6.3.324.