4 Covariance Estimation

4.1 ARCH and GARCH

This small exercise illustrates how to perform maximum likelihood estimation in R at the simple example of ARCH and GARCH models. I advice you to first write the code for the basic specification on your own, afterwards the exercises will help you to get familiar with well-established packages in R that provide you with the same (and many additional more sophisticated) methods to estimate conditional volatility models.

4.1.1 Exercises

  • As a benchmark dataset, download prices and compute daily returns for all stocks that are part of the Dow Jones 30 index (ticker <- "DOW"). Decompose the time-series into a predictable part and the residual, \(r_t = E(r_t|\mathcal{F}_{t-1}) + \varepsilon_t\). The easiest approach would be to demean the time-series by simply subtracting the sample mean but in principle more sophisticated methods can be used.
  • For each of the 30 tickers, illustrate the rolling window standard deviation based on some suitable estimation window length.
  • Write a small function that computes the maximum likelihood estimator (MLE) of an ARCH(\(p\)) model.
  • Write a second function that implements MLE estimation for an GARCH(\(p,q\)) model.
  • What is the unconditional estimated variance of the ARCH(\(1\)) and the GARCH(\(1, 1\)) model for each ticker?

Next, familiarize yourself with the rugarch package to perform more sophisticated volatility modeling. Here you can find a great example on how to unleash the flexibility of rmgarch.

  • Use either your own code or the rugarch package to fit a GARCH and an ARCH model for each of the 30 time series and create 1-day ahead volatility forecasts with one year as the initial estimation window. Compare the forecasts to a 1-day ahead volatility forecast based on the sample standard deviation (often called the random walk model). Illustrate the forecast performance against the squared return of the next day and compute the mean squared prediction error of each model. What can you conclude about the predictive performance?

  • Compute and illustrate the model-implied Value-at-risk, defined as the lowest return your model expects with a probability of less than 5 %. Formally, the VaR is defined as \(\text{VaR} _{\alpha }(X)= -\inf {\big \{}x\in \mathbb {R} :F_{-X}(x)>\alpha {\big \}}=F_{-X}^{-1}(1-\alpha)\) where \(X\) is the return distribution. Illustrate stock returns which fall below the estimated Value-at-risk. What can you conclude?

4.1.2 Solutions

library(tidyverse)
library(tidyquant)

As usual, I use the tidyquant package to download price data of the time series of 30 stocks and to compute (adjusted) log returns.

ticker <- tq_index("DOW") 
prices <- tq_get(ticker %>% 
                   filter(ticker!="DOW"), get = "stock.prices") # This is a strange bug/error in tidyquant, the ticker DOW is not part of the index DOW.

returns <- prices %>% 
  group_by(symbol) %>% 
  mutate(log_price = log(adjusted),
         return = 100 * (log_price - lag(log_price))) %>%   
  select(ticker = symbol, date, return) %>%
  na.omit() %>% 
  mutate(return = return - mean(return))

I use the slider package for convenient rolling window computations. For a detailed documentation, consult (this homepage.)[https://github.com/DavisVaughan/slider]

library(slider)

rolling_sd <- returns %>% 
  group_by(ticker) %>% 
  mutate(rolling_sd = slide_dbl(return, sd, 
                                .before = 100)) # 100 day estimation window

rolling_sd %>% 
  ggplot(aes(x=date, y = rolling_sd, color = ticker)) + 
  geom_line() +
  labs(x = "", y = "Standard deviation (rolling window") +
  theme(legend.position = "bottom")
Rolling window standard deviations

Figure 4.1: Rolling window standard deviations

The figure illustrates at least two relevant features: i) Although rather persistent, volatilities change over time and ii) volatilities tend to co-move.

Next I write a short, very simplified script which performs maximum likelihood estimation of the parameters of an ARCH(\(p\)) model in `R´.

ret <- returns %>% 
  filter(ticker == "AAPL") %>% 
  pull(return) # One example time series

# Compute ARCH and GARCH
logL <- function(params, p){
  eps_sqr_lagged <- cbind(ret, 1)
  for(i in 1:p){
    eps_sqr_lagged <- cbind(eps_sqr_lagged, lag(ret, i)^2)
  }
  eps_sqr_lagged <- na.omit(eps_sqr_lagged)
  sigma <- eps_sqr_lagged[,-1] %*% params
  0.5 * sum(log(sigma)) + 0.5 * sum(eps_sqr_lagged[,1]^2/sigma)
}

The function logL computes the (log) likelihood of an ARCH specification with \(p\) lags. Note that it returns the negative log likelihood because most optimization procedures in R are designed to search for minima instead of maximization.

The next few lines show how to estimate the model with optim and p = 2 lags. Note my (almost) arbitrary choice for the initial parameter vector. Also, note that the function logL and the optimization procedure does not enforce the regularity conditions which would ensure stationary volatility processes.

p <- 2
initial_params <- (c(sd(ret), rep(1, p)))
fit_manual <- optim(par = initial_params, 
                    fn = logL, 
                    hessian = TRUE, 
                    method="L-BFGS-B", 
                    lower = 0, 
                    p = p)

fitted_params <- (fit_manual$par)
se <- sqrt(diag(solve(fit_manual$hessian)))

cbind(fitted_params, se) %>% knitr::kable(digits = 2)
fitted_params se
2.07 0.10
0.18 0.03
0.19 0.03
# Run the code below to compare the results with tseries package
# library(tseries)
# summary(garch(ret,c(0,p)))

We can implement GARCH estimation with very few adjustments

# GARCH 

garch_logL <- function(params, p, q, return_only_loglik = TRUE){
  eps_sqr_lagged <- cbind(ret, 1)
  for(i in 1:p){
    eps_sqr_lagged <- cbind(eps_sqr_lagged, lag(ret, i)^2)
  }
  sigma.sqrd <- rep(sd(ret)^2, nrow(eps_sqr_lagged))
  for(t in (1 + max(p, q)):nrow(eps_sqr_lagged)){
    sigma.sqrd[t] <- params[1:(1+p)]%*% eps_sqr_lagged[t,-1] + 
              params[(2+p):length(params)] %*% sigma.sqrd[(t-1):(t-q)]
  }
  sigma.sqrd <- sigma.sqrd[-(1:(max(p, q)))]
  
  if(return_only_loglik){
    0.5 * sum(log(sigma.sqrd)) + 0.5 * sum(eps_sqr_lagged[(1 + max(p, q)):nrow(eps_sqr_lagged),1]^2/sigma.sqrd)
  }else{
    return(sigma.sqrd)
  }
}

p <- 1 # Lag structure 
q <- 1 
fit_garch_manual <- optim(par = rep(0.01, p + q + 1), 
                    fn = garch_logL, 
                    hessian = TRUE, 
                    method="L-BFGS-B", 
                    lower = 0, 
                    p = p,
                    q = q)

fitted_garch_params <- fit_garch_manual$par

Next I plot the time series of estimated \(\sigma_t\) based on the output. Note that in order to keep the required code clean and simple my function garch_logL returns the fitted variances with the option return_only_loglik = FALSE.

tibble(vola = garch_logL(fitted_garch_params, p, q, FALSE)) %>% 
  ggplot(aes(x = 1:length(vola), y = sqrt(vola))) +
  geom_line() +
  labs(x = "", y  = "GARCH(1,1) volatility")
Conditional volatilities based on GARCH(1,1)

Figure 4.2: Conditional volatilities based on GARCH(1,1)

Next, the typical tidyverse approach: Once we implemented the workflow for 1 asset, we can use mutate and map() to perform the same computation in a tidy manner for all assets in our sample. Recall that the unconditional variance is defined in the lecture slides.

# Estimate ARCH(1) for all ticker
uncond_var <- returns %>% 
  arrange(ticker, date) %>% 
  nest(data = c(date, return)) %>% 
  mutate(arch = map(data, function(.x){
    logL <- function(params, p){
      eps_sqr_lagged <- cbind(ret, 1)
      for(i in 1:p){
        eps_sqr_lagged <- cbind(eps_sqr_lagged, lag(ret, i)^2)
      }
      eps_sqr_lagged <- na.omit(eps_sqr_lagged)
      sigma <- eps_sqr_lagged[,-1] %*% params
      0.5 * sum(log(sigma)) + 0.5 * sum(eps_sqr_lagged[,1]^2/sigma)
    }
    
    ret <- .x %>% pull(return)    
    p <- 2
    initial_params <- (c(sd(ret), rep(1, p)))
    fit_manual <- optim(par = initial_params, 
                        fn = logL, 
                        hessian = TRUE, 
                        method="L-BFGS-B", 
                        lower = 0, 
                        p = p)
    
    fitted_params <- (fit_manual$par)
    se <- sqrt(diag(solve(fit_manual$hessian)))
    
    return(tibble(param = c("intercept", paste0("alpha ", 1:p)), value = fitted_params, se = se))
  })) %>% 
  select(-data) %>% 
  unnest(arch) %>%
  group_by(ticker) %>% 
  summarise(uncond_var = dplyr::first(value) / (1 - sum(value) + dplyr::first(value)))
  
uncond_var %>% 
  ggplot() + 
  geom_bar(aes(x = reorder(ticker, uncond_var), 
               y = uncond_var), stat = "identity") + 
  coord_flip() +
  labs(x = "Unconditional Variance", 
      y = "")
Unconditional variances

Figure 4.3: Unconditional variances

Advanced question: Make sure you understand what happens if you do not define the function logL within the mutate call but instead define it once and for all outside the loop. Hint: Calling logL once before running the loop stores the underlying sample as part of the function object and will thus not provide the estimated parameters of your subset of data but instead only of the sample which was present in the working environment once you defined the function. This is sometimes a tricky concept in R, consult Hadley Wickham’s book Advanced R for more information.

Some more code for (in-sample) estimation of a GARCH model for multiple assets is provided below. For out-of-sample computations, consult the Section on multivariate models.

First, I specify the model (standard Garch(1,1)). The lines afterwards use the function ugarchfit to fit each individual GARCH model for each ticker, the rolling window standard deviation and extracts \(\hat\sigma_t^2\). Note that these are in-sample volatilities in the sense that the entire time series is used to fit the GARCH model. In most applications, however, this is absolutely sufficient.

model.spec <- ugarchspec(variance.model = list(model = 'sGARCH' , garchOrder = c(1 , 1)))

model.fit <- returns %>% 
  mutate(Rolling = slide_dbl(return, sd, 
                             .before = 250)) %>%
  nest(data = c(date, Rolling, return)) %>% 
  mutate(garch = map(data, function(x) sigma(ugarchfit(spec = model.spec , 
                                                 data = x %>% pull(return),
                                                 solver = "hybrid"))%>% 
                       as_tibble())) %>% 
  mutate(full_data = map2(data, garch, cbind)) %>% 
  unnest(full_data) %>% 
  select(-data, -garch) %>%
  rename("Garch" = V1)

Next, I provide some illustration of the estimated paths of volatilities for the two estimation procedures.

model.fit %>% 
  pivot_longer(c(Rolling, Garch)) %>% 
  ggplot(aes(x=date, y = value, color = ticker)) + 
  geom_line() + 
  facet_wrap(~name, scales = "free_y") + 
  theme(legend.position = "bottom")

For an illustration of the value at risk computation, consult the lecture slides.

4.2 Ledoit-Wolf shrinkage estimation

A severe practical issue with the sample variance covariance matrix in large dimensions (\(N >>T\)) is that \(\hat\Sigma\) is singular. Ledoit and Wolf proposed a series of biased estimators of the variance-covariance matrix \(\Sigma\) which overcome this problem. As a result, it is often advised to perform Ledoit-Wolf like shrinkage on the variance covariance matrix before proceeding with portfolio optimization.

4.2.1 Exercises

  • Read the paper Honey, I shrunk the sample covariance matrix (provided in Absalon) and try to implement the feasible linear shrinkage estimator. If in doubt, consult Michael Wolfs homepage, you will find useful R and matlab code there. It may also be advisable to take a look at the documentation and source code of the package RiskPortfolios, in particular the function covEstimation().
  • Write a simulation that generates iid distributed returns (you can use the random number generator rnorm()) for some dimensions \(T \times N\) and compute the minimum-variance portfolio weights based on the sample variance covariance matrix and Ledoit-Wolf shrinkage. What is the mean squared deviation from the optimal portfolio (\(1/N\))? What can you conclude?

4.2.2 Solutions

The assignment mainly requires you to understand the original paper in depth - it is arguably a hard task to translate the derivations in the paper into working code. Below you find some sample code which you can use for future assignments.

compute_ledoit_wolf <- function(x) {
  # Computes Ledoit-Wolf shrinkage covariance estimator
  # This function generates the Ledoit-Wolf covariance estimator  as proposed in Ledoit, Wolf 2004 (Honey, I shrunk the sample covariance matrix.)
  # X is a (t x n) matrix of returns
  t <- nrow(x)
  n <- ncol(x)
  x <- apply(x, 2, function(x) if (is.numeric(x)) # demean x
    x - mean(x) else x)
  sample <- (1/t) * (t(x) %*% x)
  var <- diag(sample)
  sqrtvar <- sqrt(var)
  rBar <- (sum(sum(sample/(sqrtvar %*% t(sqrtvar)))) - n)/(n * (n - 1))
  prior <- rBar * sqrtvar %*% t(sqrtvar)
  diag(prior) <- var
  y <- x^2
  phiMat <- t(y) %*% y/t - 2 * (t(x) %*% x) * sample/t + sample^2
  phi <- sum(phiMat)

  repmat = function(X, m, n) {
    X <- as.matrix(X)
    mx = dim(X)[1]
    nx = dim(X)[2]
    matrix(t(matrix(X, mx, nx * n)), mx * m, nx * n, byrow = T)
  }

  term1 <- (t(x^3) %*% x)/t
  help <- t(x) %*% x/t
  helpDiag <- diag(help)
  term2 <- repmat(helpDiag, 1, n) * sample
  term3 <- help * repmat(var, 1, n)
  term4 <- repmat(var, 1, n) * sample
  thetaMat <- term1 - term2 - term3 + term4
  diag(thetaMat) <- 0
  rho <- sum(diag(phiMat)) + rBar * sum(sum(((1/sqrtvar) %*% t(sqrtvar)) * thetaMat))

  gamma <- sum(diag(t(sample - prior) %*% (sample - prior)))
  kappa <- (phi - rho)/gamma
  shrinkage <- max(0, min(1, kappa/t))
  if (is.nan(shrinkage))
    shrinkage <- 1
  sigma <- shrinkage * prior + (1 - shrinkage) * sample
  return(sigma)
}

The simulation can be conducted as below. The individual functions simulate iid normal “returns” matrix for given dimensions T and N. Next, optimal weights are computed based on either the variance-covariance matrix or the Ledoit-Wolf shrinkage equivalent. Note: if you want to generate results that can be replicated, make sure to use the function set.seed(). It takes as input any integer and makes sure that whoever runs this code afterwards retrieves exactly the same random numbers within the simulation. Otherwise, (or if you change “2021” to an arbitrary other value) there will be some variation in the results because of the random sampling of returns.

set.seed(2021)

generate_returns <- function(T, N) matrix(rnorm(T * N), nc = N)
compute_w_lw <- function(mat){
  w <- solve(compute_ledoit_wolf(mat))%*% rep(1,ncol(mat))
  return(w/sum(w))
}
compute_w_sample <- function(mat){
  w <- solve(cov(mat))%*% rep(1,ncol(mat))
  return(w/sum(w))
}

eval_weight <- function(w) length(w)^2 * sum((w - 1/length(w))^2)

Note that I evaluate weights based on the squared deviations from the naive portfolio (which is the minimum variance portfolio in the case of iid returns). The scaling length(w)^2 is just there to penalize larger dimensions harder.

simulation <- function(T = 100, N = 70){
  tmp <- matrix(NA, nc = 2, nr = 100)
  for(i in 1:100){
    mat <- generate_returns(T, N)
    w_lw <- compute_w_lw(mat)
    if(N < T) w_sample <- compute_w_sample(mat)
    if(N >= T) w_sample <- rep(NA, N)
    tmp[i, 1] <- eval_weight(w_lw)
    tmp[i, 2] <- eval_weight(w_sample)
  }
  tibble(model = c("Shrinkage", "Sample"), error = colMeans(tmp), N = N, T = T)
}

result <- bind_rows(simulation(100, 60),
                    simulation(100, 70),
                    simulation(100, 80),
                    simulation(100, 90),
                    simulation(100, 100),
                    simulation(100, 150))

result %>% pivot_wider(names_from = model, values_from = error) %>%
  knitr::kable(digits = 2, "pipe")
N T Shrinkage Sample
60 100 1.31 90.40
70 100 1.54 174.05
80 100 1.70 319.29
90 100 1.92 934.57
100 100 2.19 NA
150 100 3.19 NA

The results show a couple of interesting results: First, the sample variance-covariance matrix breaks down if \(N > T\) - there is no unique minimum variance portfolio anymore in that case. On the contrary, Ledoit-Wolf like shrinkage retains positive-definiteness of the estimator even if \(N > T\). Further, the (scaled) mean-squared error of the portfolio weights relative to the theoretically optimal naive portfolio is way smaller for the Shrinkage version. Although the variance-covariance estimator is biased, the variance of the estimator is reduced substantially and thus provides more robust portfolio weights.

4.3 Multivariate dynamic volatility modeling

This exercise is designed to provide the necessary tools for i) high-dimensional volatility modeling with rmgarch ii) and proper volatility backtesting procedures.

In my personal opinion, while being the de-facto standard for high dimensional dynamic volatility estimation, the rmgarch package documentation is next to impossible to deciffer. I took some effort to provide running code for rather standard procedures but if you encounter a more elegant solution, please let me know! First, make sure you install the rmgarch package and familiarize yourself with the documentation. Some code snippets are provided in the paper Boudt et al (2019), available in Absalon.

4.3.1 Exercises

  • Use the daily Dow Jones 30 index constituents returns as a baseline return time series. Familiarize yourself with the documentation of the rmgarch package for multivariate volatility modeling. In particular, write a short script which estimates a DCC model based on the return time series.
  • Illustrate the estimated volatilities as well as some of the estimated correlation time series.
  • Write a script which computes rolling window 1-step ahead forecasts of \(\Sigma_{t+1}\) based on the DCC GARCH model. For each iteration, store the corresponding minimum variance portfolio weights and the portfolio’s out-of-sample return.

4.3.2 Solutions

To get started, I prepare the time series of returns as a \((T \times N)\) matrix.

#install.packages("rmgarch")
library(rmgarch)

returns <- returns %>% 
  pivot_wider(names_from = ticker, values_from = return)

returns.xts <- xts(returns %>% select(-date), order.by = returns$date)

N <- ncol(returns.xts)

First rmgarch is the big (multivariate) brother of the rugarch package. Recall that for multivariate models we are often left with specifying the univariate volatility dynamics. rmgarch has a rather intuitive and (in my opinion) way to many options. In fact, rmgarch allows you to specify all kinds of different GARCH model specifications, mean dynamic specifications and also distributional assumptions.

spec <- ugarchspec(variance.model = list(model = 'sGARCH', 
                                        garchOrder = c(1, 1)))

Next we replicate the specification for each of our \(N\) assets. rmgarch has the appealing multispec function for that purpose. In principal it is possible to specify individual dynamics for each asset and to provide a list with different ugarchspec objects. In what follows we simply impose the same structure for each ticker.

# Replicate specification for each univariate asset
mspec <- multispec(replicate(N, spec))

# Specify DCC
mspec <- dccspec(mspec, 
                 model = "DCC",
                 distribution = "mvnorm")

mspec
## 
## *------------------------------*
## *       DCC GARCH Spec         *
## *------------------------------*
## Model          :  DCC(1,1)
## Estimation     :  2-step
## Distribution   :  mvnorm
## No. Parameters :  582
## No. Series     :  29

The function dccspec imposes the multivariate structure. Consult Boudt et al (2019) for many more alternative specifications.

Next we fit the object to the return time series. Note that you can make use of the fact that the computations can be parallelized very easily (however, DCC is fast enough for this dataset so you can also opt out of using a cluster).

# Generate Cluster for parallelization (you may have to change the number of nodes)
cl <- makeCluster(8)
out_of_sample_periods <- 500

fit <- dccfit(mspec, 
               out.sample = out_of_sample_periods,
               data = returns.xts, 
               cluster = cl,
               fit.control = list(eval.se = FALSE,
                                  stationarity = TRUE))

# Alternative in case you do not want to use parallel computing
# fit <- dccfit(mspec, 
#               out.sample = 500,
#               data = returns,
#               solver = c("hybrid", "lbfgs"),
#               fit.control = list(eval.se = FALSE,
#                                  stationarity = TRUE))

After considerable computing time the object fit contains all estimated parameters of the DCCGARCH model. The model has 617 parameters. Note that with the additional controls we can enforce stationarity conditions for the estimated parameters and (as we only care about forecasting in this exercise) refrain from estimating standard errors.

Sometimes it may be advisable to fit the univariate volatility dynamics first and then estimate the conditional correlation model. This may be an advantage if the univariate models are more involved. Also, if you want to keep the univariate models constant but evaluate the effect of different conditional correlation models, the following approach may safe some computation time.

mspec <- multispec(replicate(N, spec)) # Define the univariate models first
fit <- multifit(mspec, returns) # Fit each model individually 

my_dccspec <- dccspec(mspec, 
                 model = "DCC",
                 distribution = "mvnorm") # Define the multivariate model

my_dccfit <- dccfit(my_dccspec, 
                    data = returns, 
                    fit = fit) # Compute the DCC model based on the pre-estimated individual GARCH models.

Next, I create a plot of all estimated time-series of volatilities. Note that transforming rmgarch output into easy-to-work with tidyverse structures is tedious but possible.

fitted_sigmas <- sigma(fit) %>% 
  fortify() %>% 
  as_tibble() %>% 
  rename(date = Index) %>% 
  pivot_longer(-date, names_to = "ticker", values_to = "sigma")

fitted_sigmas %>% 
  ggplot(aes(x = date, 
             y = sigma,
             color = ticker)) +
  geom_line() +
  labs(x = "", y = "Fitted sigma") +
  theme(legend.position = "bottom")
Fitted conditional volatilities from rmgarch

Figure 4.4: Fitted conditional volatilities from rmgarch

It is even more tedious to get hold of the estimated time series of \((N\times N)\) matrices of correlations. The code snippet below is doing the job and provides an easily accessible way to illustrate the dynamic correlation coefficients. The figure below illustrates all correlations with ticker AAPL

fitted_correlations <- apply(rcor(fit), 3, . %>% as_tibble(rownames = "ticker") %>% 
                               pivot_longer(-ticker, names_to = "ticker.2")) %>% 
  bind_rows(.id = "date") %>% 
  filter(ticker != ticker.2) 

fitted_correlations %>% 
  filter(ticker.2 == "AAPL") %>% 
  ggplot(aes(x = as.Date(date),
             y = value,
             color = ticker)) +
  geom_line() +
  theme(legend.position = "bottom") + labs(x = "", y = "Fitted correlations")
Fitted conditional correlations from rmgarch

Figure 4.5: Fitted conditional correlations from rmgarch

Next we can use the function dccroll to compute rolling window forecasts. In other words, we recursively reestimate the GARCH model and store the one ahead covariance forecasts (and conditional means if we would have specificied a dynamic model). The parallelization makes this process luckily much faster. Note that it is important to stop the cluster after you finished the job, otherwise you may run into connection issues.

dccrolln <- dccroll(mspec, 
                   data = returns.xts, 
                   n.ahead = 1,
                   forecast.length = 500, 
                   refit.every = 100,
                   refit.window = "recursive", 
                   fit.control=list(scale=TRUE),
                   solver.control=list(trace=1),
                   cluster = cl)

stopCluster(cl)

Final step: minimum variance portfolio weights based on the estimated parameters. Again, this is somewhat cumbersome, maybe there are more efficient ways to do this in R.

pred <- apply(rcov(dccrolln), 3, function(x){ 
  w <- solve(x %>% as.matrix()) %*% rep(1, N)
  return(w/sum(w))}) %>% 
  t()

apply(pred * returns.xts %>% tail(500), 1, sum) %>% 
  enframe() %>% 
  summarise(sd = sqrt(250) * sd(value))
## # A tibble: 1 x 1
##      sd
##   <dbl>
## 1  21.7

4.4 Portfolio optimization under transaction costs

In this exercise we extend the simple portfolio analysis substantially and bring the simulation closer to a realistic framework. We will penalize turnover, evaluate the out-of-sample performance after transaction costs and introduce some robust optimization procedures in the spirit of the paper Large-scale portfolio allocation under transaction costs and model uncertainty, available in Absalon.

4.4.1 Exercises

  • Download the file returns.rds. The file contains the return time series for 274 different tickers and 2404 trading days used for the paper Hautsch et al (2019).
  • Write a function that computes optimal portfolio weights after adjusts for \(L_2\) transaction costs ex-ante. Thus, the function should take the moment estimates \(\mu_t\) and \(\Sigma_t\), the previous portfolio weight \(\omega_{t^+} := \frac{\omega_{t} \odot \left(1 + r_t\right)}{1 + w_t'r_t}\) where \(\odot\) denotes element-wise multiplication as well as the transaction cost parameter \(\beta\) into account. Within this exercise we assume that transaction costs are quadratic and of the form \[\frac{\beta}{2}\left(w_{t+1} - w_{t^+}\right)'\left(w_{t+1} - w_{t^+}\right).\] You can consult Equation (7) in Hautsch et al (2019) for further information.
  • Analyse how different transaction cost values \(\beta\) affect portfolio rebalancing. You can assume that the previous allocation was the naive portfolio. Show that for high values of \(\beta\) the initial portfolio becomes more relevant.
  • Write a script that simulates the performance of 3 different strategies before and after adjusting for transaction costs for different values of \(\beta\): A (mean-variance) utility maximization (risk aversion \(\gamma = 4\)), a naive allocation that rebalances daily and a (mean-variance) utility with turnover adjustment (risk aversion \(\gamma = 4\)). Assume that \(\beta = 1\). Evaluate the out-of-sample mean, standard deviation and Sharpe ratios. What do you conclude about the relevance of turnover penalization?

4.4.2 Solutions

The dataset is provided in Absalon and can be used without further adjustments.

returns <- read_rds("hautsch_voigt_returns.rds")
returns <- returns / 100

First I fix the relevant parameters for the evaluation, specifically the risk aversion \(\gamma\) and the number of assets \(N\).

ticker <- colnames(returns)
N <- length(ticker)
gamma <- 4

The function optimal_tc_weight generates optimal portfolio weights based on the parameters \(\mu\) and \(\Sigma\) and the previous allocation \(\omega_{t-1 ^+}\). Note (reassure yourself) that the function returns the mean-variance optimal portfolio if the transaction cost parameter \(\beta = 0\). In that case, the previous allocation vector becomes irrelevant.

optimal_tc_weight <- function(w_prev, 
                              mu, 
                              Sigma, 
                              beta = 0, 
                              gamma = 4){
  N <- ncol(Sigma)
  iota <- rep(1, N)
  Sigma_proc <- Sigma + beta / gamma * diag(N)
  mu_proc <- mu + beta * w_prev
  
  Sigma_inv <- solve(Sigma_proc)
  
  w_mvp <- Sigma_inv %*% iota
  w_mvp <- w_mvp / sum(w_mvp)
  w_opt <- w_mvp  + 1/gamma * (Sigma_inv - w_mvp %*% t(iota) %*% Sigma_inv) %*% mu_proc
  return(w_opt)
}

Next I fix the parameters \(\mu\) and \(\Sigma\) for the short simulation study. I use Ledoit Wolf shrinkage and assume that the (conditional) mean returns are 0.

# Illustrate effect of Turnover penalization
Sigma <- compute_ledoit_wolf(returns)
mu <- 0 * colMeans(returns)

beta_concentration <- function(beta){
  return(sum((optimal_tc_weight(rep(1/N, N), mu, Sigma, beta = beta, gamma = 4) - rep(1/N, N))^2))
}

beta_effect <- tibble(beta = 20 * qexp((1:99)/100)) %>% 
  mutate(concentration = map_dbl(beta, beta_concentration))

beta_effect %>% 
  ggplot(aes(x = beta, y = concentration)) + 
  geom_line() +
  scale_x_sqrt() +
  labs(x = "Transaction cost parameter", 
       y = "Rebalancing")
Rebalancing as a function of transaction costs

Figure 4.6: Rebalancing as a function of transaction costs

The figure illustrates that rebalancing substantially decreases for increasing transaction costs. In the limit \(\beta\rightarrow\infty\) the optimal portfolio will just remain \(\omega_{t-1 ^+}\) and for \(\beta = 0\) the chosen allocation corresponds to the mean-variance efficient portfolio.

Finally, out-of-sample backtest. As outline in the paper Hautsch et al (2019) a rather high value for \(\beta\) is required to mimic meaningful transaction costs. Thus we set \(\beta\) to 1 (which corresponds to 10000 basis points). Similar to the simple analysis, the code below performs the following steps: For each day, we recompute the conditional moments of the return distribution and choose optimal weights. Then we compute the out-of-sample return, turnover and the net return after adjusting for rebalancing costs.

# OOS experiment 
window_length <- 500
periods <- nrow(returns) - window_length # total number of out-of-sample periods

beta <- 1 # Transaction costs
oos_values <- matrix(NA, 
                     nrow = periods, 
                     ncol = 3) # A matrix to collect all returns
colnames(oos_values) <- c("raw_return", "turnover", "net_return") # we implement 3 strategies

all_values <- list(oos_values, 
                   oos_values,
                   oos_values)

w_prev_1 <- w_prev_2 <- w_prev_3 <- rep(1/N ,N)

for(i in 1:periods){ # Rolling window

  # Extract information
  return_window <- returns[i : (i + window_length - 1),] # the last X returns available up to date t

  # Sample moments 
  Sigma <- cov(return_window) 
  mu <- 0 * colMeans(return_window)
  
  # Optimal TC robust portfolio
  w_1 <- optimal_tc_weight(w_prev = w_prev_1, mu = mu, Sigma = Sigma, beta = beta, gamma = gamma)
  # Evaluation
  raw_return <- returns[i + window_length, ] %*% w_1
  turnover <- sum((as.vector(w_1)-as.vector(w_prev_1))^2)
  # Store realized returns
  net_return <- raw_return - beta * turnover
  all_values[[1]][i, ] <- c(raw_return, turnover, net_return)
  #cat(round(c(100 * i/periods, raw_return, turnover, net_return), 2), "\n")
  #Computes adjusted weights based on the weights and next period returns
  w_prev_1 <- w_1 * as.vector(1 + returns[i + window_length, ])
  w_prev_1 <- w_prev_1 / sum(as.vector(w_prev_1))

  # Naive Portfolio 
  w_2 <- rep(1/N, N)
  # Evaluation
  raw_return <- returns[i + window_length, ] %*% w_2
  turnover <- sum((as.vector(w_2)-as.vector(w_prev_2))^2)
  # Store realized returns
  net_return <- raw_return - beta * turnover
  all_values[[2]][i, ] <- c(raw_return, turnover, net_return)
  #Computes adjusted weights based on the weights and next period returns
  w_prev_2 <- w_2 * as.vector(1 + returns[i + window_length, ])
  w_prev_2 <- w_prev_2 / sum(as.vector(w_prev_2))
  
  # Mean-variance efficient portfolio
  w_3 <- optimal_tc_weight(w_prev = w_prev_3, mu = mu, Sigma = Sigma, beta = 0, gamma = gamma)
  # Evaluation
  raw_return <- returns[i + window_length, ] %*% w_3
  turnover <- sum((as.vector(w_3) - as.vector(w_prev_3))^2)
  # Store realized returns
  net_return <- raw_return - beta * turnover
  all_values[[3]][i, ] <- c(raw_return, turnover, net_return)
  #Computes adjusted weights based on the weights and next period returns
  w_prev_3 <- w_3 * as.vector(1 + returns[i + window_length, ])
  w_prev_3 <- w_prev_3 / sum(as.vector(w_prev_3))
}

The few final lines summarize the results. Note the effect of turnover penalization on the Sharpe ratio after adjusting for transaction costs.

all_values <- lapply(all_values, as_tibble) %>% bind_rows(.id = "strategy")

all_values %>%
  group_by(strategy) %>%
  summarise(Mean = 250* 100 * mean(net_return),
            SD = sqrt(250) * 100 * sd(net_return),
            Sharpe = if_else(Mean > 0, Mean/SD, NA_real_),
            Turnover = 10000 * mean(turnover)) %>%
  mutate(strategy = case_when(strategy == 1 ~ "MV (TC)",
                              strategy == 2 ~ "Naive", 
                              strategy == 3 ~ "MV")) %>% 
  knitr::kable(digits = 4)
strategy Mean SD Sharpe Turnover
MV (TC) 13.4877 11.1308 1.2117 0.0055
Naive 13.3285 20.9976 0.6348 0.0100
MV -80.6357 23.4857 NA 36.8183