5 Estimating and testing the CAPM

This exercise is closely aligned to the slides on the capital asset pricing model and asks you to estimate the market beta for some portfolios of assets sorted by industry, to compute optimal portfolio weights based on the CAPM factor structure and to test the validity of the CAPM.

Exercises:

  • Read in the monthly return data for 10 industry portfolios from Kenneth French’s data library from the file tidy_finance.sqlite. Also, retrieve the market and risk-free returns from Kenneth French’s data library. Provide some summary statistics for the industry portfolio (excess) returns.
  • For each industry portfolio, estimate the CAPM model with simple OLS. What are the \(\alpha\)’s? What are the \(\beta\)’s? What does the CAPM imply for \(\alpha\)?
  • Perform a joint test on the validity of the CAPM based on the finite sample distribution of a Wald test proposed by MacKinlay (1987) and Gibbons, Ross, and Shanken (1989). What do you conclude?
  • Suppose the CAPM would hold. How could you make use of the factor structure of the returns to compute portfolio weights on the efficient frontier? What is the advantage of this procedure? Compare the portfolio weights of risky assets for a desired annualized rate of return of 10% resulting from using the sample mean and covariance, \(\hat\mu\) and \(\hat\Sigma\), as input into the optimization approach with the corresponding weights based on the moments of the return distribution implied by the CAPM.

Solutions:

As usual, we load the required packages and retrieve the data from the SQlite database.

# Load required packages
library(RSQLite)
library(tidyverse)

# Read in the data
tidy_finance <- dbConnect(SQLite(), "../Lectures/data/tidy_finance.sqlite", extended_types = TRUE)

industries_ff_monthly <- tbl(tidy_finance, "industries_ff_monthly") %>%
  collect()

industries_ff_monthly <- tbl(tidy_finance, "industries_ff_monthly") %>%
  collect() %>%
  pivot_longer(-month, 
               names_to = "industry", values_to = "ret") %>%
  mutate(industry = as_factor(industry))

factors_ff_monthly <- tbl(tidy_finance, "factors_ff_monthly") %>%
  collect() 

After reading the files we join both together to compute the excess returns. The line mutate_at(vars(-date), ~ .-rf) below performs a mutation on each columns except for date and substracts the value of the column rf from the original value.

data <- industries_ff_monthly %>%
  left_join(factors_ff_monthly, by = "month") %>%
  mutate(
    ret_excess = ret - rf
  ) %>% 
  select(month, industry, ret_excess, everything()) %>%
  drop_na()

Next, I provide some summary statistics for each industry portfolio excess returns.

data %>%
  group_by(industry) %>%
  summarise(mean = 12*mean(100 * ret_excess), 
            sd = sqrt(12)*sd(100 * ret_excess),
            sharpe = mean/sd) %>%
  kableExtra::kbl(digits = 2)

The implication of the CAPM is that all elements of the \((N \times 1)\) vector \(\alpha\) are zero in the joint regression framework \[Z_t = \alpha + \beta Z_{m,t} + \varepsilon_t\] where \(\beta\) is the \((N \times 1)\) vector of market betas (if \(\alpha\) is zero, then \(m\) is the tangency portfolio). Then, standard ordinary least squares (OLS) estimators procedure delivers \[\hat\alpha = \hat\mu - \hat\beta\hat\mu_m\] with \[\hat\beta = \frac{\sum (Z_t - \hat\mu)(Z_{m,t} - \hat\mu_m)}{\sum (Z_{m,t} - \hat\mu_m)^2}.\] Alternatively, let \(X = (1, Z_m)\) be a \((T \times 2)\) matrix, then \((X'X)^{-1}X'Z\) is the \((2 \times N)\) matrix of coefficients. You can also use lm to compute the same coefficients but for the sake of brevity I provide the explicit estimation below.

returns <- data %>% select(month, industry, mkt_excess, ret_excess) %>%pivot_wider(names_from = industry, values_from = ret_excess)
Z <- returns %>% select(-month, -mkt_excess) %>% as.matrix()
Z_m <- returns %>% pull(mkt_excess)

X <- cbind(1, Z_m)
mle_estimate <- solve(t(X)%*%X) %*% t(X)%*%Z
alphas <- matrix(mle_estimate[1, ], nr = 1)
# Exactly the same can be computed with lm(Z~Z_m)

The estimated intercept coefficients are

mle_estimate[1,]
##     NoDur     Durbl     Manuf     Enrgy     HiTec     Telcm     Shops      Hlth 
##  0.002354 -0.000379 -0.000186  0.000836 -0.000175  0.000931  0.001441  0.002177 
##     Utils     Other 
##  0.001959 -0.000381

To evaluate the significance of the individual parameters you can either implement the equations provided in Chapter 4 of The Econometrics of Financial Markets (available in Absalon) or exploit directly the functionalities provided by lm (which, in principle, also allow to compute standard errors in the presence of heteroscedasticity). As an example, see the code below

  fit <- lm(Z~Z_m)
  fit <- summary(fit)
  fit$`Response HiTec`
## 
## Call:
## lm(formula = HiTec ~ Z_m)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.13845 -0.01824 -0.00037  0.01698  0.14748 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.000175   0.001189   -0.15     0.88    
## Z_m          1.236516   0.026496   46.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0319 on 730 degrees of freedom
## Multiple R-squared:  0.749,  Adjusted R-squared:  0.749 
## F-statistic: 2.18e+03 on 1 and 730 DF,  p-value: <2e-16

The estimated parameters reveal that all industry portfolios exhibit a positive alpha (you can convince yourself that they are significantly different than 0). This indicates that each of the portfolios yield returns which are not fully explained by the exposure to market fluctuations. The estimated betas range from 0.66 (telecommunication) to 1.25 (durable goods) and reflect that industry are differently affected by fluctuations in the economy.

Next, we test if all alphas are jointly different from zero. If that is the case it implies that the market portfolio is not the efficient tangency portfolio. The following code just follows the equations outlined in the slides: MacKinlay (1987) and Gibbons, Ross, and Shanken (1989) developed the finite-sample distribution of the test-statistic \(J\) which yields \[J = \frac{T-N-1}{N}\left(1 + \frac{\hat \mu ^2}{\hat\sigma_m^2}\right)^{-1}\hat\alpha'\hat\Sigma^{-1}\hat\alpha\] where \(\hat\Sigma = Cov(\hat\varepsilon)\).

# Perform Wald test
N <- ncol(Z)
T <- nrow(Z)
mle_sigma <- cov(Z - X %*% mle_estimate)
test_statistic <- (T-N-1)/N * (1 + mean(Z_m)^2/sd(Z_m)^2)^(-1) * alphas %*% solve(mle_sigma) %*% t(alphas)
test_statistic
##      [,1]
## [1,] 1.71

Under the null hypothesis, \(J\) is unconditionally distributed central \(F\) with \(N\) degrees of freedom in the numerator and \((T—N—l)\) degrees of freedom in the denominator. R provides extensive functions to sample from probability distributions, to evaluate quantiles and the probability mass at a specific point. You can check ?rnorm for a description based on the normal distribution.

1 - pf(test_statistic, T, T - N - 1) # Compute p-value
##          [,1]
## [1,] 2.84e-13
qf(0.95, T, T - N - 1) # 95% critical value
## [1] 1.13

The test clearly discard the CAPM. In other words, the estimated alphas are too high to justify that our chosen proxy for the market portfolio can be the efficient tangent portfolio.

As pointed out in the slides, the assumption of a factor structure in the returns reduces the number of parameters to estimate substantially. Next we compute the moments of the return distribution in two ways.

# Compute sample moments
mu <- returns %>% select(-month, -mkt_excess) %>% colMeans()
sigma <- returns %>% select(-month, -mkt_excess) %>% cov()

# Same with CAPM
# Compute restricted betas
constrained_estimate <- solve(t(Z_m)%*%Z_m) %*% t(Z_m)%*%Z

mu_m <- mean(returns$mkt_excess) # average market risk premium
rf_m <- mean(factors_ff_monthly$rf) # average risk-free rate

mu_capm <- rf_m + constrained_estimate * mu_m
mu_capm <- as.numeric(mu_capm)
sigma_eps <- apply(Z - Z_m %*% constrained_estimate, 2, sd) # Column-wise residual standard deviation
sigma_m <- var(returns$mkt_excess)

sigma_capm <- sigma_m * t(constrained_estimate) %*% constrained_estimate + diag(sigma_eps)

Recall the function compute_efficient_portfolio() from the lecture slides. We can now asses the resulting portfolio weights based on an imposed factor structure or the sample moments of the return distribution.

compute_efficient_portfolio <- function(sigma, mu, mu_bar = 0.30 / 12){
iota <- rep(1, ncol(sigma))
sigma_inv <- solve(sigma)
w_mvp <- sigma_inv %*% iota
w_mvp <- w_mvp / sum(w_mvp)
C <- as.numeric(t(iota)%*%sigma_inv%*%iota)
D <- as.numeric(t(iota)%*%sigma_inv%*%mu)
E <- as.numeric(t(mu)%*%sigma_inv%*%mu)
lambda_tilde <- as.numeric(2*(mu_bar -D/C)/(E-D^2/C))
weff <- w_mvp + lambda_tilde / 2 * (sigma_inv%*%mu - D/C*sigma_inv%*%iota)
return(t(weff))
}

The resulting portfolio weights differ because the inputs changed

w <- compute_efficient_portfolio(sigma, mu, 0.05 / 12)
w_capm <- compute_efficient_portfolio(sigma_capm, mu_capm, 0.05 / 12)

rbind(w, w_capm)
##       NoDur  Durbl  Manuf   Enrgy  HiTec Telcm   Shops    Hlth Utils  Other
## [1,] -0.310 -0.158  0.758 -0.0737 -0.126 0.427 -0.1080 0.00315 0.783 -0.195
## [2,]  0.513 -0.314 -0.172  0.1826 -0.488 0.436 -0.0221 0.28521 0.931 -0.352