Chapter 2 Time Series

Reference : Statistical Inference via Data Science.

2.1 Needed Package

I recommend you to install tidyverse, a collection of packages for data science. To install and load the packages, type

install.packages("tidyverse")
# or the development version
# devtools::install_github("tidyverse/tidyverse")
library(tidyverse)

2.2 For Loop

In this section, we generate a stochastic process following $X_t = (1 + \theta L)\varepsilon_t$ where $\varepsilon_t$ ’s are i.i.d. normally distributed with standard deviation $\sigma = 0.5$ . And we set $\theta = 0.6$ .

#===============================
# Sample code for MA(1) process
#===============================

rm(list = ls(all = TRUE))
set.seed(1)

# Initialize
theta <- .6
sigma <- .5

T <- 1000         # length of stochastic process
x <- numeric(T)  # memory allocation
eps_old <- rnorm(1, mean = 0, sd = sigma)

# Generate Data
for (t in 1 : T){
  eps_new <- rnorm(1, mean = 0, sd = sigma)
  x[t] <- eps_new + theta * eps_old
  eps_old <- eps_new
}


# or more simply,
# set.seed(1)
# eps <- rnorm(T + 1, mean = 0, sd = sigma)
# x <- eps[-1] + theta * eps[-length(eps)]

2.3 Sample Moments

Note that $E(X_t) = 0$ , $\gamma(0) = Var(X_t) = (1 + \theta^2)\sigma^2$ and $\gamma(1) = Cov(X_t, X_{t-1}) = \theta \sigma^2$ . We compute sample moments and compare them with theoretical moments.

#=================================
# Sample code for MA(1) processes
#=================================

rm(list = ls(all = TRUE))
set.seed(1)

# Initialize
theta <- .6
sigma <- .5
N <- 300                        # number of stochastic processes
T <- 1000                        # length of a stochastic process
X <- matrix(nrow = N, ncol = T) # memory allocation

eps_old <- rnorm(N, mean = 0, sd = sigma)

# Generate Data
for (t in 1 : T){
  eps_new <- rnorm(N, mean = 0, sd = sigma)
  X[, t] <- eps_new + theta * eps_old
  eps_old <- eps_new
}

# Sample Moments
mean_est <- colMeans(X)
gamma0_est <- apply(X, MARGIN = 2, FUN = var)
gamma1_est <- sapply(seq(T-1), FUN = function(i) cov(X[,i], X[,i+1]))

# True Moments
mean_true <- 0
gamma0_true <- (1 + theta ^ 2) * sigma ^ 2
gamma1_true <- theta * sigma ^ 2

# Results
abs(c(mean(mean_est - mean_true), 
      mean(gamma0_est - gamma0_true), 
      mean(gamma1_est - gamma1_true)))

## [1] 0.0002941528 0.0006930542 0.0007107480

2.3.1 Note on `apply` function

We may compute gamma0_est via for loop as follows,

# Compute gamma0 via for loop
start_for <- Sys.time()
gamma0_est_for <- numeric(T)

for (t in seq(T)){
  gamma0_est_for[t] <- var(X[,t])
}
end_for <- Sys.time()
time_for <- end_for - start_for

# Compute gamma0 via apply function
start_apply <- Sys.time()
gamma0_est <- apply(X, MARGIN = 2, FUN = var)
end_apply <- Sys.time()
time_apply <- end_apply - start_apply

# time difference
time_for - time_apply

## Time difference of 0.01096916 secs

We can see that the code with apply function is faster than for loop and more readable.

2.4 ggplot2

ggplot2 is one the most widely used packages for data visualization. We already loaded this package by running library(tidyverse). In fact, we loaded all the following packages at once,

ggplot2, dplyr, tidyr, readr, purrr, tibble, strignr, and forcats packages.

If you don’t need other packages, type library(ggplot2).

2.4.1 The Grammar of Graphics

The gg in ggplot2 is an abbreviation for the grammar of graphics. According to Wickham (2016), the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system.

# generate data.frame
MA <- data.frame("x" = seq(T - 1),
                 "mean_est" = mean_est[-T],
                 "mean_true" = rep(mean_true, T - 1),
                 "gamma0_est" = gamma0_est[-T],
                 "gamma0_true" = rep(gamma0_true, T - 1),
                 "gamma1_est" = gamma1_est[-T],
                 "gamma1_true" = rep(gamma1_true, T - 1)
                 )
# View(fig_MA)

# x coordinate
fig <- ggplot(data = MA, mapping = aes(x = x))

# multiple lines
fig <- fig + geom_line(aes(y = mean_true, color = "True Mean"), 
                       linetype = "twodash")
fig <- fig + geom_line(aes(y = mean_est, color = "Sample Mean"), 
                       linetype = "solid")
fig <- fig + geom_line(aes(y = gamma0_true, color = "True Var"), 
                       linetype = "twodash")
fig <- fig + geom_line(aes(y =gamma0_est, color = "Sample Var"), 
                       linetype = "solid")
fig <- fig + geom_line(aes(y = gamma1_true, color = "True Cov"), 
                       linetype = "twodash")
fig <- fig + geom_line(aes(y = gamma1_est, color = "Sample Cov"),                           linetype = "solid")

# text
fig <- fig + labs(x = "Time", y = "Moments", title = "MA(1) Process")
fig <- fig + 
       theme(plot.title=element_text(size=16, family="serif"),
             axis.title.x=element_text(size=13, family="serif"),
             axis.text.x=element_text(size=13, family="serif"),
             axis.title.y=element_text(size=13, family="serif"),
             axis.text.y=element_text(size=13, family="serif"),
             legend.text=element_text(size=13, family="serif"),
             legend.title=element_blank()
             )
show(fig)
ggsave("MA2.png", plot = fig, width = 7, height = 4)