30.2 Estimation

30.2.1 2SLS Estimation

A special case of IV-GMM

Examples by authors of fixest package

library(fixest)
base = iris
names(base) = c("y", "x1", "x_endo_1", "x_inst_1", "fe")
set.seed(2)
base$x_inst_2 = 0.2 * base$y + 0.2 * base$x_endo_1 + rnorm(150, sd = 0.5)
base$x_endo_2 = 0.2 * base$y - 0.2 * base$x_inst_1 + rnorm(150, sd = 0.5)

# est_iv = feols(y ~ x1 | x_endo_1  ~ x_inst_1 , base)
est_iv = feols(y ~ x1 | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, base)
est_iv
#> TSLS estimation - Dep. Var.: y
#>                   Endo.    : x_endo_1, x_endo_2
#>                   Instr.   : x_inst_1, x_inst_2
#> Second stage: Dep. Var.: y
#> Observations: 150
#> Standard-errors: IID 
#>              Estimate Std. Error  t value   Pr(>|t|)    
#> (Intercept)  1.831380   0.411435  4.45121 1.6844e-05 ***
#> fit_x_endo_1 0.444982   0.022086 20.14744  < 2.2e-16 ***
#> fit_x_endo_2 0.639916   0.307376  2.08186 3.9100e-02 *  
#> x1           0.565095   0.084715  6.67051 4.9180e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.398842   Adj. R2: 0.761653
#> F-test (1st stage), x_endo_1: stat = 903.2    , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat =   3.25828, p = 0.041268, on 2 and 146 DoF.
#>                   Wu-Hausman: stat =   6.79183, p = 0.001518, on 2 and 144 DoF.

Default statistics

F-test first-stage (weak instrument test)
Wu-Hausman endogeneity test
Over-identifying restriction (Sargan) J-test

fitstat(
    est_iv,
    type = c(
        "n", "ll", "aic", "bic", "rmse", # ll means log-likelihood
        
        "my", # mean dependent var

        "g", # degrees of freedom used to compute the t-test

        "r2", "ar2", "wr2", "awr2", "pr2", "apr2", "wpr2", "awpr2",

        "theta", # over-dispersion parameter in Negative Binomial models

        "f", "wf", # F-tests of nullity of the coefficients

        "wald", # Wald test of joint nullity of the coefficients

        "ivf",
        
        "ivf1",

        "ivf2",

        "ivfall",
        
        "ivwald", "ivwald1", "ivwald2", "ivwaldall",

        "cd"
        
        # "kpr"
        
        
        ),
    cluster = 'fe'
)
#>                 Observations: 150
#>               Log-Likelihood: -75.0
#>                          AIC: 157.9
#>                          BIC: 170.0
#>                         RMSE: 0.398842
#>               Dep. Var. mean: 5.84333
#>                            G: 3
#>                           R2: 0.766452
#>                      Adj. R2: 0.761653
#>                    Within R2: NA
#>                         awr2: NA
#>                    Pseudo R2: 0.592684
#>               Adj. Pseudo R2: 0.576383
#>             Within Pseudo R2: NA
#>                        awpr2: NA
#>              Over-dispersion: NA
#>                       F-test: stat =       1.80769, p = 0.375558, on 3 and 2 DoF.
#>           F-test (projected): NA
#>         Wald (joint nullity): stat = 539,363.2    , p < 2.2e-16 , on 3 and 146 DoF, VCOV: Clustered (fe).
#> F-test (1st stage), x_endo_1: stat =     903.2    , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat =       3.25828, p = 0.041268, on 2 and 146 DoF.
#>           F-test (2nd stage): stat =     194.2    , p < 2.2e-16 , on 2 and 146 DoF.
#>             F-test (IV only): stat =     194.2    , p < 2.2e-16 , on 2 and 146 DoF.
#> Wald (1st stage), x_endo_1  : stat =   1,482.6    , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe).
#> Wald (1st stage), x_endo_2  : stat =       2.22157, p = 0.112092, on 2 and 146 DoF, VCOV: Clustered (fe).
#>             Wald (2nd stage): stat = 539,363.2    , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe).
#>               Wald (IV only): stat = 539,363.2    , p < 2.2e-16 , on 2 and 146 DoF, VCOV: Clustered (fe).
#>                 Cragg-Donald: 3.11162

To set default printing

# always add second-stage Wald test
setFixest_print(fitstat = ~ . + ivwald2)
est_iv

To see results from different stages

# first-stage
summary(est_iv, stage = 1)

# second-stage
summary(est_iv, stage = 2)

# both stages
etable(summary(est_iv, stage = 1:2), fitstat = ~ . + ivfall + ivwaldall.p)
etable(summary(est_iv, stage = 2:1), fitstat = ~ . + ivfall + ivwaldall.p)
# .p means p-value, not statistic
# `all` means IV only

30.2.2 IV-GMM

This is a more general framework.

2SLS Estimation is a special case of IV-GMM estimator

\[ Y = X \beta + u, u \sim (0, \Omega) \]

where

\(X\) is a matrix of endogenous variables (\(N\times k\))

We will use a matrix of instruments \(X\) where it has \(N \times l\) dimensions (where \(l \ge k\))

Then, we can have a set of \(l\) moments:

\[ g_i (\beta) = Z_i' u_i = Z_i' (Y_i - X_i \beta) \]

where

\(i \in (1,N)\)

Each \(l\) moment equation is a sample moment, which can be estimated by averaging over \(N\)

\[ \bar{g}(\beta) = \frac{1}{N} \sum_{i = 1}^N Z_i (Y_i - X_i \beta) = \frac{1}{N} Z'u \]

GMM then estimate \(\beta\) so that \(\bar{g}(\hat{\beta}_{GMM}) = 0\)

When \(l = k\) there is a unique solution to this system of equations (and equivalent to the IV estimator)

\[ \hat{\beta}_{IV} = (Z'X)^{-1}Z'Y \]

When \(l > k\), we have a set of \(k\) instruments

\[ \hat{X} = Z(Z'Z)^{-1} Z' X = P_ZX \]

then we can use the 2SLS estimator

\[ \begin{aligned} \hat{\beta}_{2SLS} &= (\hat{X}'X)^{-1} \hat{X}' Y \\ &= (X'P_Z X)^{-1}X' P_Z Y \end{aligned} \]

Differences between 2SLS and IV-GMM:

In the 2SLS method, when there are more instruments available than what is actually needed for the estimation, to address this, a matrix is created that only includes the necessary instruments, which simplifies the calculation.
The IV-GMM method uses all the available instruments, but applies a weighting system to prioritize the instruments that are most relevant. This approach is useful when there are more instruments than necessary, which can make the calculation more complex. The IV-GMM method uses a criterion function to weight the estimates and improve their accuracy.
In short, always use IV-GMM when you have overid problems

GMM estimator minimizes

\[ J (\hat{\beta}_{GMM} ) = N \bar{g}(\hat{\beta}_{GMM})' W \bar{g} (\hat{\beta}_{GMM}) \]

where \(W\) is a symmetric weighting matrix \(l \times l\)

For an overid equation, solving the set of FOCs for the IV-GMM estimator, we should have

\[ \hat{\beta}_{GMM} = (X'ZWZ' X)^{-1} X'ZWZ'Y \]

which is identical for all \(W\) matrices. The optimal \(W = S^{-1}\) (L. P. Hansen 1982) where \(S\) is the covariance matrix of the moment conditions to produce the most efficient estimator:

\[ S = E[Z'uu'Z] = \lim_{N \to \infty} N^{-1}[Z' \Omega Z] \]

With a consistent estimator of \(S\) from the 2SLS residuals, the feasible IV-GMM estimator can be defined as

\[ \hat{\beta}_{FEGMM} = (X'Z \hat{S}^{-1} Z' X)^{-1} X'Z \hat{S}^{-1} Z'Y \]

In cases where \(\Omega\) (i.e., the vcov of the error process \(u\)) satisfy all classical assumptions

IID
\(S = \sigma^2_u I_N\)
The optimal weighting matrix is proportional to the identity matrix

Then, IV-GMM estimator is the standard IV (or 2SLS) estimator.

For IV-GMM, you also have an additional test of overid restrictions: GMM distance (also known as Hayashi C statistic)

To account for clustering, one can use code provided by this blog

References

Hansen, Lars Peter. 1982. “Large Sample Properties of Generalized Method of Moments Estimators.” Econometrica: Journal of the Econometric Society, 1029–54.