33.2 Estimation
33.2.1 2SLS Estimation
A special case of IV-GMM
Examples by authors of fixest
base = iris
names(base) = c("y", "x1", "x_endo_1", "x_inst_1", "fe")
base$x_inst_2 = 0.2 * base$y + 0.2 * base$x_endo_1 + rnorm(150, sd = 0.5)
base$x_endo_2 = 0.2 * base$y - 0.2 * base$x_inst_1 + rnorm(150, sd = 0.5)
# est_iv = feols(y ~ x1 | x_endo_1 ~ x_inst_1 , base)
est_iv = feols(y ~ x1 | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, base)
#> TSLS estimation - Dep. Var.: y
#> Endo. : x_endo_1, x_endo_2
#> Instr. : x_inst_1, x_inst_2
#> Second stage: Dep. Var.: y
#> Observations: 150
#> Standard-errors: IID
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.831380 0.411435 4.45121 1.6844e-05 ***
#> fit_x_endo_1 0.444982 0.022086 20.14744 < 2.2e-16 ***
#> fit_x_endo_2 0.639916 0.307376 2.08186 3.9100e-02 *
#> x1 0.565095 0.084715 6.67051 4.9180e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.398842 Adj. R2: 0.761653
#> F-test (1st stage), x_endo_1: stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat = 3.25828, p = 0.041268, on 2 and 146 DoF.
#> Wu-Hausman: stat = 6.79183, p = 0.001518, on 2 and 144 DoF.
Default statistics
- F-test first-stage (weak instrument test)
- Wu-Hausman endogeneity test
- Over-identifying restriction (Sargan) J-test
type = c(
"n", "ll", "aic", "bic", "rmse", # ll means log-likelihood
"my", # mean dependent var
"g", # degrees of freedom used to compute the t-test
"r2", "ar2", "wr2", "awr2", "pr2", "apr2", "wpr2", "awpr2",
"theta", # over-dispersion parameter in Negative Binomial models
"f", "wf", # F-tests of nullity of the coefficients
"wald", # Wald test of joint nullity of the coefficients
"ivwald", "ivwald1", "ivwald2", "ivwaldall",
# "kpr"
cluster = 'fe'
#> Observations: 150
#> Log-Likelihood: -75.0
#> AIC: 157.9
#> BIC: 170.0
#> RMSE: 0.398842
#> Dep. Var. mean: 5.84333
#> G: 3
#> R2: 0.766452
#> Adj. R2: 0.761653
#> Within R2: NA
#> awr2: NA
#> Pseudo R2: 0.592684
#> Adj. Pseudo R2: 0.576383
#> Within Pseudo R2: NA
#> awpr2: NA
#> Over-dispersion: NA
#> F-test: stat = 1.80769, p = 0.375558, on 3 and 2 DoF.
#> F-test (projected): NA
#> Wald (joint nullity): stat = 539,363.2 , p < 2.2e-16 , on 3 and 146 DoF, VCOV: Clustered (fe).
#> F-test (1st stage), x_endo_1: stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat = 3.25828, p = 0.041268, on 2 and 146 DoF.
#> F-test (2nd stage): stat = 194.2 , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (IV only): stat = 194.2 , p < 2.2e-16 , on 2 and 146 DoF.
#> Wald (1st stage), x_endo_1 : stat = 1,482.6 , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe).
#> Wald (1st stage), x_endo_2 : stat = 2.22157, p = 0.112092, on 2 and 146 DoF, VCOV: Clustered (fe).
#> Wald (2nd stage): stat = 539,363.2 , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe).
#> Wald (IV only): stat = 539,363.2 , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe).
#> Cragg-Donald: 3.11162
To set default printing
To see results from different stages
# first-stage
summary(est_iv, stage = 1)
# second-stage
summary(est_iv, stage = 2)
# both stages
etable(summary(est_iv, stage = 1:2), fitstat = ~ . + ivfall + ivwaldall.p)
etable(summary(est_iv, stage = 2:1), fitstat = ~ . + ivfall + ivwaldall.p)
# .p means p-value, not statistic
# `all` means IV only
33.2.2 IV-GMM
This is a more general framework.
- 2SLS Estimation is a special case of IV-GMM estimator
\[ Y = X \beta + u, u \sim (0, \Omega) \]
- \(X\) is a matrix of endogenous variables (\(N\times k\))
We will use a matrix of instruments \(X\) where it has \(N \times l\) dimensions (where \(l \ge k\))
Then, we can have a set of \(l\) moments:
\[ g_i (\beta) = Z_i' u_i = Z_i' (Y_i - X_i \beta) \]
- \(i \in (1,N)\)
Each \(l\) moment equation is a sample moment, which can be estimated by averaging over \(N\)
\[ \bar{g}(\beta) = \frac{1}{N} \sum_{i = 1}^N Z_i (Y_i - X_i \beta) = \frac{1}{N} Z'u \]
GMM then estimate \(\beta\) so that \(\bar{g}(\hat{\beta}_{GMM}) = 0\)
When \(l = k\) there is a unique solution to this system of equations (and equivalent to the IV estimator)
\[ \hat{\beta}_{IV} = (Z'X)^{-1}Z'Y \]
When \(l > k\), we have a set of \(k\) instruments
\[ \hat{X} = Z(Z'Z)^{-1} Z' X = P_ZX \]
then we can use the 2SLS estimator
\[ \begin{aligned} \hat{\beta}_{2SLS} &= (\hat{X}'X)^{-1} \hat{X}' Y \\ &= (X'P_Z X)^{-1}X' P_Z Y \end{aligned} \]
Differences between 2SLS and IV-GMM:
In the 2SLS method, when there are more instruments available than what is actually needed for the estimation, to address this, a matrix is created that only includes the necessary instruments, which simplifies the calculation.
The IV-GMM method uses all the available instruments, but applies a weighting system to prioritize the instruments that are most relevant. This approach is useful when there are more instruments than necessary, which can make the calculation more complex. The IV-GMM method uses a criterion function to weight the estimates and improve their accuracy.
In short, always use IV-GMM when you have overid problems
GMM estimator minimizes
\[ J (\hat{\beta}_{GMM} ) = N \bar{g}(\hat{\beta}_{GMM})' W \bar{g} (\hat{\beta}_{GMM}) \]
where \(W\) is a symmetric weighting matrix \(l \times l\)
For an overid equation, solving the set of FOCs for the IV-GMM estimator, we should have
\[ \hat{\beta}_{GMM} = (X'ZWZ' X)^{-1} X'ZWZ'Y \]
which is identical for all \(W\) matrices. The optimal \(W = S^{-1}\) (L. P. Hansen 1982) where \(S\) is the covariance matrix of the moment conditions to produce the most efficient estimator:
\[ S = E[Z'uu'Z] = \lim_{N \to \infty} N^{-1}[Z' \Omega Z] \]
With a consistent estimator of \(S\) from the 2SLS residuals, the feasible IV-GMM estimator can be defined as
\[ \hat{\beta}_{FEGMM} = (X'Z \hat{S}^{-1} Z' X)^{-1} X'Z \hat{S}^{-1} Z'Y \]
In cases where \(\Omega\) (i.e., the vcov of the error process \(u\)) satisfy all classical assumptions
- \(S = \sigma^2_u I_N\)
- The optimal weighting matrix is proportional to the identity matrix
Then, IV-GMM estimator is the standard IV (or 2SLS) estimator.
For IV-GMM, you also have an additional test of overid restrictions: GMM distance (also known as Hayashi C statistic)
To account for clustering, one can use code provided by this blog