Chapter 6 Fixed or random effects

This section was originally prepared for the Adanced Methods of Political Analysis (Poli 706) in Spring 2019, which I served as a TA for Tobias Heinrich. The goal of this part is to address one common question we encounter in our research: when to use fixed or random effects?

6.1 Prerequisite

Load the following packages for this section.

library(modelr)
library(tidyverse)
library(foreign)
library(plm)

6.2 Introduction

Let us write down the following model (from Wooldridge p.435).

$y_{it}=\beta_{1}x_{it} + a_{i} + u_{it}$

Essentially, we have some “fixed” errors ( $a_{i}$ ) over time. When we add the additional assumption that $a_{i}$ is uncorrelated with any explanatory variable, we have the random effects model.

Let us get started by using the following data to run a simple OLS.

Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta")
ols <-lm(y ~ x1, data=Panel)

Now plot the fitted line.

grid <- data.frame(Intercept=1, x1=seq_range(Panel$x1, 10))
grid$pred <- predict(ols,grid)
ggplot(Panel, aes(x1)) +
  geom_point(aes(y = y)) +
  geom_line(aes(y = pred), data = grid, colour = "red", size = 1)

6.3 Fixed effects

There are two common ways to fit fixed effects. The first one is to addd dummy variables for the group indicator.

fixed_dum <-lm(y ~ x1 + factor(country) - 1, data=Panel)
summary(fixed_dum)

## 
## Call:
## lm(formula = y ~ x1 + factor(country) - 1, data = Panel)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -8.634e+09 -9.697e+08  5.405e+08  1.386e+09  5.612e+09 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## x1                2.476e+09  1.107e+09   2.237  0.02889 *  
## factor(country)A  8.805e+08  9.618e+08   0.916  0.36347    
## factor(country)B -1.058e+09  1.051e+09  -1.006  0.31811    
## factor(country)C -1.723e+09  1.632e+09  -1.056  0.29508    
## factor(country)D  3.163e+09  9.095e+08   3.478  0.00093 ***
## factor(country)E -6.026e+08  1.064e+09  -0.566  0.57329    
## factor(country)F  2.011e+09  1.123e+09   1.791  0.07821 .  
## factor(country)G -9.847e+08  1.493e+09  -0.660  0.51190    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.796e+09 on 62 degrees of freedom
## Multiple R-squared:  0.4402, Adjusted R-squared:  0.368 
## F-statistic: 6.095 on 8 and 62 DF,  p-value: 8.892e-06

grid_fixdum <- expand(data.frame(Intercept=1, x1=seq_range(Panel$x1, 14), country=unique(Panel$country)),
                      x1, country)
grid_fixdum$pred <- predict(fixed_dum,grid_fixdum)
ggplot(Panel, aes(x1)) +
  geom_point(aes(y = y, colour=country)) +
  geom_smooth(aes(y = y), method='lm', se=FALSE, colour="black")+
  geom_line(data=grid_fixdum, aes(x=x1, y=pred, colour=country))

The second way is to incorporate fixed effects directly. In R, it is typically referred to as “within” model.

fixed <- plm(y ~ x1, data=Panel, index=c("country", "year"), model="within")
summary(fixed)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = y ~ x1, data = Panel, model = "within", index = c("country", 
##     "year"))
## 
## Balanced Panel: n = 7, T = 10, N = 70
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -8.63e+09 -9.70e+08  5.40e+08  0.00e+00  1.39e+09  5.61e+09 
## 
## Coefficients:
##      Estimate Std. Error t-value Pr(>|t|)  
## x1 2475617827 1106675594   2.237  0.02889 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5.2364e+20
## Residual Sum of Squares: 4.8454e+20
## R-Squared:      0.074684
## Adj. R-Squared: -0.029788
## F-statistic: 5.00411 on 1 and 62 DF, p-value: 0.028892

We can then extract the fixed effects.

fixef(fixed)

##           A           B           C           D           E           F 
##   880542404 -1057858363 -1722810755  3162826897  -602622000  2010731793 
##           G 
##  -984717493

We can also test whether the fixed effects model is better than OLS.

pFtest(fixed, ols)

## 
##  F test for individual effects
## 
## data:  y ~ x1
## F = 2.9655, df1 = 6, df2 = 62, p-value = 0.01307
## alternative hypothesis: significant effects

6.4 Random effects

random <- plm(y ~ x1, data=Panel, index=c("country", "year"), model="random")
summary(random)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = y ~ x1, data = Panel, model = "random", index = c("country", 
##     "year"))
## 
## Balanced Panel: n = 7, T = 10, N = 70
## 
## Effects:
##                     var   std.dev share
## idiosyncratic 7.815e+18 2.796e+09 0.873
## individual    1.133e+18 1.065e+09 0.127
## theta: 0.3611
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -8.94e+09 -1.51e+09  2.82e+08  0.00e+00  1.56e+09  6.63e+09 
## 
## Coefficients:
##               Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 1037014284  790626206  1.3116   0.1896
## x1          1247001782  902145601  1.3823   0.1669
## 
## Total Sum of Squares:    5.6595e+20
## Residual Sum of Squares: 5.5048e+20
## R-Squared:      0.02733
## Adj. R-Squared: 0.013026
## Chisq: 1.91065 on 1 DF, p-value: 0.16689

6.4.1 Fixed or random

You can run a Hausman test (which tests whether the unique errors are correlated with the regressors, the null is they are not). If the p-value is significant, then you choose fixed effects (since the unique errors are correlated with the regressors).

phtest(fixed, random)

## 
##  Hausman Test
## 
## data:  y ~ x1
## chisq = 3.674, df = 1, p-value = 0.05527
## alternative hypothesis: one model is inconsistent

6.5 More tests

You can also add time fixed effects.

fixed_time_dum <- plm(y ~ x1 + factor(year), data=Panel, index=c("country", "year"), model="within")
summary(fixed_time_dum)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = y ~ x1 + factor(year), data = Panel, model = "within", 
##     index = c("country", "year"))
## 
## Balanced Panel: n = 7, T = 10, N = 70
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -7.92e+09 -1.05e+09 -1.40e+08  0.00e+00  1.63e+09  5.49e+09 
## 
## Coefficients:
##                    Estimate Std. Error t-value Pr(>|t|)  
## x1               1389050354 1319849567  1.0524  0.29738  
## factor(year)1991  296381559 1503368528  0.1971  0.84447  
## factor(year)1992  145369666 1547226548  0.0940  0.92550  
## factor(year)1993 2874386795 1503862554  1.9113  0.06138 .
## factor(year)1994 2848156288 1661498927  1.7142  0.09233 .
## factor(year)1995  973941306 1567245748  0.6214  0.53698  
## factor(year)1996 1672812557 1631539254  1.0253  0.30988  
## factor(year)1997 2991770063 1627062032  1.8388  0.07156 .
## factor(year)1998  367463593 1587924445  0.2314  0.81789  
## factor(year)1999 1258751933 1512397632  0.8323  0.40898  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    5.2364e+20
## Residual Sum of Squares: 4.0201e+20
## R-Squared:      0.23229
## Adj. R-Squared: 0.00052851
## F-statistic: 1.60365 on 10 and 53 DF, p-value: 0.13113

Or simply,

fixed_time <- plm(y ~ x1, data=Panel, index=c("country", "year"), model="within", effect="twoway")
summary(fixed_time)

## Twoways effects Within Model
## 
## Call:
## plm(formula = y ~ x1, data = Panel, effect = "twoway", model = "within", 
##     index = c("country", "year"))
## 
## Balanced Panel: n = 7, T = 10, N = 70
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -7.92e+09 -1.05e+09 -1.40e+08  0.00e+00  1.63e+09  5.49e+09 
## 
## Coefficients:
##      Estimate Std. Error t-value Pr(>|t|)
## x1 1389050354 1319849567  1.0524   0.2974
## 
## Total Sum of Squares:    4.1041e+20
## Residual Sum of Squares: 4.0201e+20
## R-Squared:      0.020471
## Adj. R-Squared: -0.27524
## F-statistic: 1.10761 on 1 and 53 DF, p-value: 0.29738

6.5.1 Test whether adding time-fixed effects is necessary

pFtest(fixed_time, fixed)

or (a note for myself, plmtest cannot be deployed into the website as it causes errors, which is odd since other people seem to be able to deploy)

plmtest(fixed, effect="time", type="bp")

6.5.2 Test between OLS and random effects

pool <- plm(y ~ x1, data=Panel, index=c("country", "year"), model="pooling")
plmtest(pool, effect="individual", type="bp")

6.5.3 Test cross-sectional dependence (contemporaneous correlation)

pcdtest(fixed, test = c("lm"))
pcdtest(fixed, test = c("cd"))

6.5.4 Test for serial correlation

pbgtest(fixed)

6.5.5 Test for stationarity

Panel_set <- pdata.frame(Panel, index = c("country", "year"))
library(tseries)
adf.test(Panel_set$y, k=2)

6.5.6 Test for heteroskedasticity

library(lmtest)
bptest(y ~ x1 + factor(country), data = Panel, studentize=F)

To address heteroskedasticity, you can use robust standard errors.

coeftest(fixed) 
coeftest(fixed, vcovHC)

There are different types of estimators and options to choose from.

coeftest(fixed, vcovHC(fixed, method = "arellano")) 
t(sapply(c("HC0", "HC1", "HC2", "HC3", "HC4"), function(x) sqrt(diag(vcovHC(fixed, method = "arellano", type = x)))))

Here are some suggestions from Oscar Torres-Reyna’s tutorial.

The –vcovHC– function estimates three heteroskedasticity-consistent covariance estimators:

“white1” - for general heteroskedasticity but no serial correlation. Recommended for random effects.
“white2” - is “white1” restricted to a common variance within groups. Recommended for random effects.
“arellano” - both heteroskedasticity and serial correlation. Recommended for fixed effects.

The following options apply:

HC0 - heteroskedasticity consistent. The default.
HC1,HC2, HC3 – Recommended for small samples. HC3 gives less weight to influential observations.
HC4 - small samples with influential observations
HAC - heteroskedasticity and autocorrelation consistent (type ?vcovHAC for more details)

For more details, see here at page 4.

6.6 Additional reading

Brandon Bartels offers some suggestions for analyzing multilevel data here. See here for Andrew Gelman’s take.

If you find the use of fixed vs. random effects confusing or unsatisfying, I would highly recommend Gelman and Hill’s book Data Analysis Using Regression and Multilevel/Hierarchical Models, where they urge us to avoid using the term “fixed” and “random” entirely. In particular, you should read at least chapter 11 and 12.

6.7 Acknowledgement

This section is drawn from Princeton’s Getting Started in Fixed/Random Effects Models using R by Oscar Torres-Reyna.