9 (Co)variance estimation

9.1 ARCH and GARCH

This short exercise illustrates how to perform maximum likelihood estimation in R at the simple example of ARCH\((p)\) and GARCH(\(p, q\)) models. I advice you to first write the code for the basic specification on your own, afterwards the exercises will help you to get familiar with well-established packages in R that provide you with the same (and many additional more sophisticated) methods to estimate conditional volatility models.

Exercises:

  1. As a benchmark dataset, download prices and compute daily returns for all stocks that are part of the Dow Jones 30 index (ticker <- "DOW"). Decompose the time-series into a predictable part and the residual, \(r_t = E(r_t|\mathcal{F}_{t-1}) + \varepsilon_t\). The easiest approach would be to demean the time-series by simply subtracting the sample mean but in principle more sophisticated methods can be used.

  2. For each of the 30 tickers, illustrate the rolling window standard deviation based on some suitable estimation window length.

  3. Write a small function that computes the maximum likelihood estimator (MLE) of an ARCH(\(p\)) model.

  4. Write a second function that implements MLE estimation for an GARCH(\(p,q\)) model.

  5. What is the unconditional estimated variance of the ARCH(\(1\)) and the GARCH(\(1, 1\)) model for each ticker?

  6. Next, familiarize yourself with the rugarch package to perform more sophisticated volatility modeling. Here you can find a great example on how to unleash the flexibility of rmgarch.

  7. Use either your own code or the rugarch package to fit a GARCH and an ARCH model for each of the 30 time series and create 1-day ahead volatility forecasts with one year as the initial estimation window. Compare the forecasts to a 1-day ahead volatility forecast based on the sample standard deviation (often called the random walk model). Illustrate the forecast performance against the squared return of the next day and compute the mean squared prediction error of each model. What can you conclude about the predictive performance?

  • Compute and illustrate the model-implied Value-at-risk, defined as the lowest return your model expects with a probability of less than 5 %. Formally, the VaR is defined as \(\text{VaR} _{\alpha }(X)= -\inf {\big \{}x\in \mathbb {R} :F_{-X}(x)>\alpha {\big \}}=F_{-X}^{-1}(1-\alpha)\) where \(X\) is the return distribution. Illustrate stock returns which fall below the estimated Value-at-risk. What can you conclude?

Solutions:

As usual, I use the tidyquant package to download price data of the time series of 30 stocks and to compute (adjusted) net returns.

ticker <- tq_index("DOW") 
prices <- tq_get(ticker %>% 
                   filter(ticker!="DOW"), get = "stock.prices") 

returns <- prices %>%
  group_by(symbol) %>%
  mutate(ret = adjusted / lag(adjusted) - 1) %>%
  select(symbol, date, ret) %>%
  drop_na(ret) %>% 
  mutate(ret = ret - mean(ret))

I use the slider package for convenient rolling window computations. For a detailed documentation, consult (this homepage.)[https://github.com/DavisVaughan/slider]

library(slider)

rolling_sd <- returns %>% 
  group_by(symbol) %>% 
  mutate(rolling_sd = slide_dbl(ret, sd, 
                                .before = 100)) # 100 day estimation window

rolling_sd %>%
  drop_na() %>% 
  ggplot(aes(x=date, y = rolling_sd, color = symbol)) + 
  geom_line() +
  labs(x = "", y = "Standard deviation (rolling window") +
  theme(legend.position = "None")

The figure illustrates at least two relevant features: i) Although rather persistent, volatilities change over time and ii) volatilities tend to co-move. Next I write a short, very simplified script which performs maximum likelihood estimation of the parameters of an ARCH(\(p\)) model in `R´.

ret <- returns %>% 
  filter(symbol == "AAPL") %>% 
  pull(ret) * 100

# Compute ARCH
logL <- function(params, p){
  eps_sqr_lagged <- cbind(ret, 1)
  for(i in 1:p){
    eps_sqr_lagged <- cbind(eps_sqr_lagged, lag(ret, i)^2)
  }
  eps_sqr_lagged <- na.omit(eps_sqr_lagged)
  sigma <- eps_sqr_lagged[,-1] %*% params
  0.5 * sum(log(sigma)) + 0.5 * sum(eps_sqr_lagged[,1]^2/sigma)
}

The function logL computes the (log) likelihood of an ARCH specification with \(p\) lags. Note that it returns the negative log likelihood because most optimization procedures in R are designed to search for minima instead of maximization.

The next few lines show how to estimate the model with optim and p = 2 lags. Note my (almost) arbitrary choice for the initial parameter vector. Also, note that the function logL and the optimization procedure does not enforce the regularity conditions which would ensure stationary volatility processes.

p <- 2
initial_params <- (c(sd(ret), rep(1, p)))
fit_manual <- optim(par = initial_params, 
                    fn = logL, 
                    hessian = TRUE, 
                    method="L-BFGS-B", 
                    lower = 0, 
                    p = p)

fitted_params <- (fit_manual$par)
se <- sqrt(diag(solve(fit_manual$hessian)))

cbind(fitted_params, se) %>% knitr::kable(digits = 2)
fitted_params se
2.05 0.09
0.19 0.03
0.19 0.03
# Run the code below to compare the results with tseries package
# library(tseries)
# summary(garch(ret,c(0,p)))

We can implement GARCH estimation with very few adjustments

# GARCH 

garch_logL <- function(params, p, q, return_only_loglik = TRUE){
  eps_sqr_lagged <- cbind(ret, 1)
  for(i in 1:p){
    eps_sqr_lagged <- cbind(eps_sqr_lagged, lag(ret, i)^2)
  }
  sigma.sqrd <- rep(sd(ret)^2, nrow(eps_sqr_lagged))
  for(t in (1 + max(p, q)):nrow(eps_sqr_lagged)){
    sigma.sqrd[t] <- params[1:(1+p)]%*% eps_sqr_lagged[t,-1] + 
              params[(2+p):length(params)] %*% sigma.sqrd[(t-1):(t-q)]
  }
  sigma.sqrd <- sigma.sqrd[-(1:(max(p, q)))]
  
  if(return_only_loglik){
    0.5 * sum(log(sigma.sqrd)) + 0.5 * sum(eps_sqr_lagged[(1 + max(p, q)):nrow(eps_sqr_lagged),1]^2/sigma.sqrd)
  }else{
    return(sigma.sqrd)
  }
}

p <- 1 # Lag structure 
q <- 1 
fit_garch_manual <- optim(par = rep(0.01, p + q + 1), 
                    fn = garch_logL, 
                    hessian = TRUE, 
                    method="L-BFGS-B", 
                    lower = 0, 
                    p = p,
                    q = q)

fitted_garch_params <- fit_garch_manual$par

Next I plot the time series of estimated \(\sigma_t\) based on the output. Note that in order to keep the required code clean and simple my function garch_logL returns the fitted variances with the option return_only_loglik = FALSE.

tibble(vola = garch_logL(fitted_garch_params, p, q, FALSE)) %>% 
  ggplot(aes(x = 1:length(vola), y = sqrt(vola))) +
  geom_line() +
  labs(x = "", y  = "GARCH(1,1) volatility")
Conditional volatilities based on GARCH(1,1)

Figure 9.1: Conditional volatilities based on GARCH(1,1)

Next, the typical tidyverse approach: Once we implemented the workflow for 1 asset, we can use mutate and map() to perform the same computation in a tidy manner for all assets in our sample. Recall that the unconditional variance is defined in the lecture slides.

# Estimate ARCH(1) for all ticker
uncond_var <- returns %>% 
  arrange(symbol, date) %>% 
  nest(data = c(date, ret)) %>% 
  mutate(arch = map(data, function(.x){
    logL <- function(params, p){
      eps_sqr_lagged <- cbind(ret, 1)
      for(i in 1:p){
        eps_sqr_lagged <- cbind(eps_sqr_lagged, lag(ret, i)^2)
      }
      eps_sqr_lagged <- na.omit(eps_sqr_lagged)
      sigma <- eps_sqr_lagged[,-1] %*% params
      0.5 * sum(log(sigma)) + 0.5 * sum(eps_sqr_lagged[,1]^2/sigma)
    }
    
    ret <- .x %>% pull(ret) * 100   
    p <- 2
    initial_params <- (c(sd(ret), rep(1, p)))
    fit_manual <- optim(par = initial_params, 
                        fn = logL, 
                        hessian = TRUE, 
                        method="L-BFGS-B", 
                        lower = 0, 
                        p = p)
    
    fitted_params <- (fit_manual$par)
    se <- sqrt(diag(solve(fit_manual$hessian)))
    
    return(tibble(param = c("intercept", paste0("alpha ", 1:p)), value = fitted_params, se = se))
  })) %>% 
  select(-data) %>% 
  unnest(arch) %>%
  group_by(symbol) %>% 
  summarise(uncond_var = dplyr::first(value) / (1 - sum(value) + dplyr::first(value)))
  
uncond_var %>% 
  ggplot() + 
  geom_bar(aes(x = reorder(symbol, uncond_var), 
               y = uncond_var), stat = "identity") + 
  coord_flip() +
  labs(x = "Unconditional Variance", 
      y = "")

Advanced question: Make sure you understand what happens if you do not define the function logL within the mutate call but instead define it once and for all outside the loop. Hint: Calling logL once before running the loop stores the underlying sample as part of the function object and will thus not provide the estimated parameters of your subset of data but instead only of the sample which was present in the working environment once you defined the function. This is sometimes a tricky concept in R, consult Hadley Wickham’s book Advanced R for more information.

Some more code for (in-sample) estimation of a GARCH model for multiple assets is provided below. For out-of-sample computations, consult the Section on multivariate models.

First, I specify the model (standard Garch(1,1)). The lines afterwards use the function ugarchfit to fit each individual GARCH model for each ticker, the rolling window standard deviation and extracts \(\hat\sigma_t^2\). Note that these are in-sample volatilities in the sense that the entire time series is used to fit the GARCH model. In most applications, however, this is absolutely sufficient.

library(rugarch)
model.spec <- ugarchspec(variance.model = list(model = 'sGARCH' , garchOrder = c(1 , 1)))

model.fit <- returns %>% 
  mutate(Rolling = slide_dbl(ret, sd, 
                             .before = 250)) %>%
  nest(data = c(date, Rolling, ret)) %>% 
  mutate(garch = map(data, function(x) sigma(ugarchfit(spec = model.spec , 
                                                 data = x %>% pull(ret),
                                                 solver = "hybrid"))%>% 
                       as_tibble())) %>% 
  mutate(full_data = map2(data, garch, cbind)) %>% 
  unnest(full_data) %>% 
  select(-data, -garch) %>%
  rename("Garch" = V1)

Next, I provide some illustration of the estimated paths of volatilities for the two estimation procedures.

model.fit %>% 
  pivot_longer(c(Rolling, Garch)) %>% 
  ggplot(aes(x=date, y = value, color = symbol)) + 
  geom_line() + 
  facet_wrap(~name) + 
  theme(legend.position = "bottom")

For an illustration of the value at risk computation, consult the lecture slides.

9.2 Ledoit-Wolf shrinkage estimation

A severe practical issue with the sample variance covariance matrix in large dimensions (\(N >>T\)) is that \(\hat\Sigma\) is singular. Ledoit and Wolf proposed a series of biased estimators of the variance-covariance matrix \(\Sigma\) which overcome this problem. As a result, it is often advised to perform Ledoit-Wolf like shrinkage on the variance covariance matrix before proceeding with portfolio optimization.

Exercises:

  • Read the paper Honey, I shrunk the sample covariance matrix (provided in Absalon) and try to implement the feasible linear shrinkage estimator. If in doubt, consult Michael Wolfs homepage, you will find useful R and matlab code there. It may also be advisable to take a look at the documentation and source code of the package RiskPortfolios, in particular the function covEstimation().
  • Write a simulation that generates iid distributed returns (you can use the random number generator rnorm()) for some dimensions \(T \times N\) and compute the minimum-variance portfolio weights based on the sample variance covariance matrix and Ledoit-Wolf shrinkage. What is the mean squared deviation from the optimal portfolio (\(1/N\))? What can you conclude?

Solutions:

The assignment mainly requires you to understand the original paper in depth - it is arguably a hard task to translate the derivations in the paper into working code. Below you find some sample code which you can use for future assignments.

compute_ledoit_wolf <- function(x) {
  # Computes Ledoit-Wolf shrinkage covariance estimator
  # This function generates the Ledoit-Wolf covariance estimator  as proposed in Ledoit, Wolf 2004 (Honey, I shrunk the sample covariance matrix.)
  # X is a (t x n) matrix of returns
  t <- nrow(x)
  n <- ncol(x)
  x <- apply(x, 2, function(x) if (is.numeric(x)) # demean x
    x - mean(x) else x)
  sample <- (1/t) * (t(x) %*% x)
  var <- diag(sample)
  sqrtvar <- sqrt(var)
  rBar <- (sum(sum(sample/(sqrtvar %*% t(sqrtvar)))) - n)/(n * (n - 1))
  prior <- rBar * sqrtvar %*% t(sqrtvar)
  diag(prior) <- var
  y <- x^2
  phiMat <- t(y) %*% y/t - 2 * (t(x) %*% x) * sample/t + sample^2
  phi <- sum(phiMat)

  repmat = function(X, m, n) {
    X <- as.matrix(X)
    mx = dim(X)[1]
    nx = dim(X)[2]
    matrix(t(matrix(X, mx, nx * n)), mx * m, nx * n, byrow = T)
  }

  term1 <- (t(x^3) %*% x)/t
  help <- t(x) %*% x/t
  helpDiag <- diag(help)
  term2 <- repmat(helpDiag, 1, n) * sample
  term3 <- help * repmat(var, 1, n)
  term4 <- repmat(var, 1, n) * sample
  thetaMat <- term1 - term2 - term3 + term4
  diag(thetaMat) <- 0
  rho <- sum(diag(phiMat)) + rBar * sum(sum(((1/sqrtvar) %*% t(sqrtvar)) * thetaMat))

  gamma <- sum(diag(t(sample - prior) %*% (sample - prior)))
  kappa <- (phi - rho)/gamma
  shrinkage <- max(0, min(1, kappa/t))
  if (is.nan(shrinkage))
    shrinkage <- 1
  sigma <- shrinkage * prior + (1 - shrinkage) * sample
  return(sigma)
}

The simulation can be conducted as below. The individual functions simulate iid normal “returns” matrix for given dimensions T and N. Next, optimal weights are computed based on either the variance-covariance matrix or the Ledoit-Wolf shrinkage equivalent. Note: if you want to generate results that can be replicated, make sure to use the function set.seed(). It takes as input any integer and makes sure that whoever runs this code afterwards retrieves exactly the same random numbers within the simulation. Otherwise, (or if you change “2021” to an arbitrary other value) there will be some variation in the results because of the random sampling of returns.

set.seed(2022)

generate_returns <- function(T, N) matrix(rnorm(T * N), nc = N)
compute_w_lw <- function(mat){
  w <- solve(compute_ledoit_wolf(mat))%*% rep(1,ncol(mat))
  return(w/sum(w))
}
compute_w_sample <- function(mat){
  w <- solve(cov(mat))%*% rep(1,ncol(mat))
  return(w/sum(w))
}

eval_weight <- function(w) length(w)^2 * sum((w - 1/length(w))^2)

Note that I evaluate weights based on the squared deviations from the naive portfolio (which is the minimum variance portfolio in the case of iid returns). The scaling length(w)^2 is just there to penalize larger dimensions harder.

simulation <- function(T = 100, N = 70){
  tmp <- matrix(NA, nc = 2, nr = 100)
  for(i in 1:100){
    mat <- generate_returns(T, N)
    w_lw <- compute_w_lw(mat)
    if(N < T) w_sample <- compute_w_sample(mat)
    if(N >= T) w_sample <- rep(NA, N)
    tmp[i, 1] <- eval_weight(w_lw)
    tmp[i, 2] <- eval_weight(w_sample)
  }
  tibble(model = c("Shrinkage", "Sample"), error = colMeans(tmp), N = N, T = T)
}

result <- bind_rows(simulation(100, 60),
                    simulation(100, 70),
                    simulation(100, 80),
                    simulation(100, 90),
                    simulation(100, 100),
                    simulation(100, 150))

result %>% pivot_wider(names_from = model, values_from = error) %>%
  knitr::kable(digits = 2, "pipe")
N T Shrinkage Sample
60 100 1.25 85.8
70 100 1.46 163.0
80 100 1.81 344.3
90 100 1.91 862.9
100 100 2.19 NA
150 100 3.19 NA

The results show a couple of interesting results: First, the sample variance-covariance matrix breaks down if \(N > T\) - there is no unique minimum variance portfolio anymore in that case. On the contrary, Ledoit-Wolf like shrinkage retains positive-definiteness of the estimator even if \(N > T\). Further, the (scaled) mean-squared error of the portfolio weights relative to the theoretically optimal naive portfolio is way smaller for the Shrinkage version. Although the variance-covariance estimator is biased, the variance of the estimator is reduced substantially and thus provides more robust portfolio weights.

9.3 Multivariate dynamic volatility modeling

This exercise is designed to provide the necessary tools for i) high-dimensional volatility modeling with rmgarch ii) and proper volatility backtesting procedures. In my personal opinion, while being the de-facto standard for high dimensional dynamic volatility estimation, the rmgarch package documentation is next to impossible to deciffer. I took some effort to provide running code for rather standard procedures but if you encounter a more elegant solution, please let me know! First, make sure you install the rmgarch package and familiarize yourself with the documentation. Some code snippets are provided in the paper Boudt et al (2019), available in Absalon.

Exercises:

  • Use the daily Dow Jones 30 index constituents returns as a baseline return time series. Familiarize yourself with the documentation of the rmgarch package for multivariate volatility modeling. In particular, write a short script which estimates a DCC model based on the return time series.
  • Illustrate the estimated volatilities as well as some of the estimated correlation time series.
  • Write a script which computes rolling window 1-step ahead forecasts of \(\Sigma_{t+1}\) based on the DCC GARCH model. For each iteration, store the corresponding minimum variance portfolio weights and the portfolio’s out-of-sample return.

Solutions:

To get started, I prepare the time series of returns as a \((T \times N)\) matrix.

#install.packages("rmgarch")
library(rmgarch)

returns <- returns %>% 
  pivot_wider(names_from = symbol, values_from = ret)

returns.xts <- xts(returns %>% select(-date), order.by = returns$date)

N <- ncol(returns.xts)

First rmgarch is the big (multivariate) brother of the rugarch package. Recall that for multivariate models we are often left with specifying the univariate volatility dynamics. rmgarch has a rather intuitive and (in my opinion) way to many options. In fact, rmgarch allows you to specify all kinds of different GARCH model specifications, mean dynamic specifications and also distributional assumptions.

spec <- ugarchspec(variance.model = list(model = 'sGARCH', 
                                        garchOrder = c(1, 1)))

Next we replicate the specification for each of our \(N\) assets. rmgarch has the appealing multispec function for that purpose. In principal it is possible to specify individual dynamics for each asset and to provide a list with different ugarchspec objects. In what follows we simply impose the same structure for each ticker.

# Replicate specification for each univariate asset
mspec <- multispec(replicate(N, spec))

# Specify DCC
mspec_dcc <- dccspec(mspec, 
                 model = "DCC",
                 distribution = "mvnorm")

mspec_dcc
## 
## *------------------------------*
## *       DCC GARCH Spec         *
## *------------------------------*
## Model          :  DCC(1,1)
## Estimation     :  2-step
## Distribution   :  mvnorm
## No. Parameters :  582
## No. Series     :  29

The function dccspec imposes the multivariate structure. Consult Boudt et al (2019) for many more alternative specifications.

Next we fit the object to the return time series. Note that you can make use of the fact that the computations can be parallelized very easily (however, DCC is fast enough for this dataset so you can also opt out of using a cluster).

# Generate Cluster for parallelization (you may have to change the number of nodes)
cl <- makeCluster(8)
out_of_sample_periods <- 500

fit <- dccfit(mspec_dcc, 
               out.sample = out_of_sample_periods,
               data = returns.xts, 
               cluster = cl,
               fit.control = list(eval.se = FALSE,
                                  stationarity = TRUE))

# Alternative in case you do not want to use parallel computing
# fit <- dccfit(mspec, 
#               out.sample = 500,
#               data = returns,
#               solver = c("hybrid", "lbfgs"),
#               fit.control = list(eval.se = FALSE,
#                                  stationarity = TRUE))

After considerable computing time the object fit contains all estimated parameters of the DCCGARCH model. The model has 617 parameters. Note that with the additional controls we can enforce stationarity conditions for the estimated parameters and (as we only care about forecasting in this exercise) refrain from estimating standard errors.

Next, I create a plot of all estimated time-series of volatilities. Note that transforming rmgarch output into easy-to-work with tidyverse structures is tedious but possible.

fitted_sigmas <- sigma(fit) %>% 
  fortify() %>% 
  as_tibble() %>% 
  rename(date = Index) %>% 
  pivot_longer(-date, names_to = "ticker", values_to = "sigma")

fitted_sigmas %>% 
  ggplot(aes(x = date, 
             y = sigma,
             color = ticker)) +
  geom_line() +
  labs(x = "", y = "Fitted sigma") +
  theme(legend.position = "bottom")

It is even more tedious to get hold of the estimated time series of \((N\times N)\) matrices of correlations. The code snippet below is doing the job and provides an easily accessible way to illustrate the dynamic correlation coefficients. The figure below illustrates all correlations with ticker AAPL

fitted_correlations <- apply(rcor(fit), 3, . %>% as_tibble(rownames = "ticker") %>% 
                               pivot_longer(-ticker, names_to = "ticker.2")) %>% 
  bind_rows(.id = "date") %>% 
  filter(ticker != ticker.2) 

fitted_correlations %>% 
  filter(ticker.2 == "AAPL") %>% 
  ggplot(aes(x = as.Date(date),
             y = value,
             color = ticker)) +
  geom_line() +
  theme(legend.position = "bottom") + labs(x = "", y = "Fitted correlations", color = NULL)

Next we can use the function dccroll to compute rolling window forecasts. In other words, we recursively reestimate the GARCH model and store the one ahead covariance forecasts (and conditional means if we would have specified a dynamic model). The parallelization makes this process luckily much faster. Note that it is important to stop the cluster after you finished the job, otherwise you may run into connection issues.

dccrolln <- dccroll(mspec_dcc, 
                   data = returns.xts, 
                   n.ahead = 1,
                   forecast.length = 500, 
                   refit.every = 100,
                   refit.window = "recursive", 
                   fit.control=list(scale=TRUE),
                   solver.control=list(trace=1),
                   cluster = cl)

stopCluster(cl)

Final step: minimum variance portfolio weights based on the estimated parameters. Again, this is somewhat cumbersome, maybe there are more efficient ways to do this in R.

pred <- apply(rcov(dccrolln), 3, function(x){ 
  w <- solve(x %>% as.matrix()) %*% rep(1, N)
  return(w/sum(w))}) %>% 
  t()

apply(pred * returns.xts %>% tail(500), 1, sum) %>% 
  enframe() %>% 
  summarise(sd = sqrt(250) * sd(value))
## # A tibble: 1 x 1
##      sd
##   <dbl>
## 1 0.163