6 Fama MacBeth Regressions


This exercises asks you to implement Fama MacBeth regressions. For a more detailed theoretical treatment, check out the Chapter of the textbook The econometrics of financial markets (available in Absalon).

  • Load the estimated monthly betas beta from the SQlite database.
  • Before starting with implementing the regression, write a short function to winsorize a variable. Winsorizing means to simply replace values which are above or below a pre-specified quantile with that quantile. It is common in the asset pricing literature to winsorize the independent variable before running regressions so it is a good thing to implement winsorization once.
  • Estimate the market risk premium based on Fama MacBeth regressions. To do this, for each cross-section of returns (e.g., each month), project the realized returns on the estimated betas from the previous exercise and then aggregate the estimates in the time dimension. That is, for each month \(t\), estimate \[Z_t = \gamma_{0,t}\iota + \gamma_{1,t}\beta + \eta_t\] and then compute the time-series mean of \(\hat\gamma_{0,t}\) and \(\hat\gamma_{1,t}\).


We start with loading the usual packages and database

# Read in the data
tidy_finance <- dbConnect(SQLite(), "../Lectures/data/tidy_finance.sqlite",
  extended_types = TRUE

crsp_monthly <- tbl(tidy_finance, "crsp_monthly") %>%

factors_ff_monthly <- tbl(tidy_finance, "factors_ff_monthly") %>%

beta <- tbl(tidy_finance, "beta") %>%

We keep only relevant data from the CRSP sample.

crsp_monthly <- crsp_monthly %>%
  left_join(factors_ff_monthly, by = "month") %>%
  select(permno, month, ret_excess, mkt_excess, mktcap_lag)

Next, we merge our sorting variable with the return data. We use the one-month lagged betas as a sorting variable to ensure that the sorts rely only on information available when we create the portfolios. To lag stock beta by one month, we add one month to the current date and join the resulting information with our return data. This procedure ensures that month

beta_lag <- beta %>%
  mutate(month = month %m+% months(1)) %>%
  select(permno, month, beta_lag = beta_daily) %>%

beta_data <- crsp_monthly %>%
  inner_join(beta_lag, by = c("permno", "month"))

The methodology of winsorizing is straight-forward to implement (although also somewhat arbitrary). The function below implements winsorizing in a pipeable (%>%) fashion.

winsorize <- function(x, cut = 0.005){
  cut_point_top <- quantile(x, 1 - cut, na.rm = TRUE)
  cut_point_bottom <- quantile(x, cut, na.rm = TRUE)
  i <- which(x >= cut_point_top)
  x[i] <- cut_point_top
  j <- which(x <= cut_point_bottom)
  x[j] <- cut_point_bottom

Next we prepare the data which simply means we winsorize extreme beta estimates

nest_beta_data <- beta_data %>%
  drop_na(ret_excess, beta_lag) %>%
  group_by(month) %>%
  mutate(beta_lag = winsorize(beta_lag)) %>%

After winsorizing we start with the “first” step of the Fama-Mac Beth two-step regression procedure. For each cross-section (that is, for each month) we regress the (one-month ahead) excess returns on the (pre-estimated) beta. Note the use of the tidyverse package broom which provides an easy way to extract summary stats for the individual regression estimations. If it is unclear what nested() and map are doing, check out the chapter Many models in Hadley Wickham’s book.

# perform cross-sectional regressions for each month
cross_sectional_regs <- nest_beta_data %>%
  mutate(model = map(data, ~lm(ret_excess ~ beta_lag, data = .x)),
         tidy = map(model, broom::tidy),
         n = map_dbl(model, stats::nobs))

# extract average coefficient estimates
fama_macbeth_coefs <- cross_sectional_regs %>%
  unnest(tidy) %>%
  group_by(term) %>%
  summarize(coefficient = mean(estimate))

Above we compute the time-series average of the estimated regression coefficients \(\hat\gamma_0\) and \(\hat\gamma_1\). The remaining code implements Newey West standard errors of the average coefficient estimates and proceeds with some householding to present the regression results in a nice way.

# compute newey-west standard errors of average coefficient estimates
newey_west_std_errors <- cross_sectional_regs %>%
  unnest(tidy) %>%
  group_by(term) %>%
  arrange(month) %>%
  group_modify(~enframe(sqrt(diag(sandwich::NeweyWest(lm(estimate ~ 1, data = .x), lag = 6))))) %>%
  select(term, nw_std_error = value)

# put coefficient estimates and standard errors together and compute t-statistics
fama_macbeth_coefs <- fama_macbeth_coefs %>%
  left_join(newey_west_std_errors, by = "term") %>%
  mutate(nw_t_stat = coefficient / nw_std_error) %>%
  select(term, coefficient, nw_t_stat) %>%
  pivot_longer(cols = c(coefficient, nw_t_stat), names_to = "statistic") %>%
  mutate(statistic = paste(term, statistic, sep = " ")) %>%

# extract average number of observations
fama_macbeth_stats <- cross_sectional_regs %>%
  ungroup() %>%
  summarize(n = mean(n)) %>%
  pivot_longer(n, names_to = "statistic")

# combine desired output and return results
out <- rbind(fama_macbeth_coefs, fama_macbeth_stats)

out %>% knitr::kable(digits = 2, "pipe")
statistic value
(Intercept) coefficient 0.01
(Intercept) nw_t_stat 3.99
beta_lag coefficient 0.00
beta_lag nw_t_stat -1.71
n 4334.45

What do the results show? One the one hand side, we can reject the \(H_0\) that the intercept is zero. However, the coefficient on the market beta, which is the market risk premium is not significantly different from zero. While slightly negative (which is not what we would expect from a priced risk factor), these regression results do not provide clear evidence that the market factor is priced.