34.3 Estimation

34.3.1 Two-Stage Least Squares Estimation

Two-Stage Least Squares (2SLS) is the most widely used IV estimator It’s a special case of IV-GMM. Consider the structural equation:

Yi=Xiβ+εi,

where Xi is endogenous. We introduce an instrument Zi satisfying:

  1. Relevance: Zi is correlated with Xi.
  2. Exogeneity: Zi is uncorrelated with εi.

2SLS Steps

  1. First-Stage Regression: Predict Xi using the instrument: Xi=π0+π1Zi+vi.

    • Obtain fitted values ˆXi=π0+π1Zi.
  2. Second-Stage Regression: Use ˆXi in place of Xi: Yi=β0+β1ˆXi+εi.

    • The estimated ˆβ1 is our IV estimator.
library(fixest)
base = iris
names(base) = c("y", "x1", "x_endo_1", "x_inst_1", "fe")
set.seed(2)
base$x_inst_2 = 0.2 * base$y + 0.2 * base$x_endo_1 + rnorm(150, sd = 0.5)
base$x_endo_2 = 0.2 * base$y - 0.2 * base$x_inst_1 + rnorm(150, sd = 0.5)

# IV Estimation
est_iv = feols(y ~ x1 | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, base)
summary(est_iv)
#> TSLS estimation - Dep. Var.: y
#>                   Endo.    : x_endo_1, x_endo_2
#>                   Instr.   : x_inst_1, x_inst_2
#> Second stage: Dep. Var.: y
#> Observations: 150
#> Standard-errors: IID 
#>              Estimate Std. Error  t value   Pr(>|t|)    
#> (Intercept)  1.831380   0.411435  4.45121 1.6844e-05 ***
#> fit_x_endo_1 0.444982   0.022086 20.14744  < 2.2e-16 ***
#> fit_x_endo_2 0.639916   0.307376  2.08186 3.9100e-02 *  
#> x1           0.565095   0.084715  6.67051 4.9180e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.398842   Adj. R2: 0.761653
#> F-test (1st stage), x_endo_1: stat = 903.2    , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat =   3.25828, p = 0.041268, on 2 and 146 DoF.
#>                   Wu-Hausman: stat =   6.79183, p = 0.001518, on 2 and 144 DoF.

Diagnostic Tests

To assess instrument validity:

fitstat(est_iv, type = c("n", "f", "ivf", "ivf1", "ivf2", "ivwald", "cd"))
#>                 Observations: 150
#>                       F-test: stat = 132.0    , p < 2.2e-16 , on 3 and 146 DoF.
#> F-test (1st stage), x_endo_1: stat = 903.2    , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat =   3.25828, p = 0.041268, on 2 and 146 DoF.
#>           F-test (2nd stage): stat = 194.2    , p < 2.2e-16 , on 2 and 146 DoF.
#> Wald (1st stage), x_endo_1  : stat = 903.2    , p < 2.2e-16 , on 2 and 146 DoF, VCOV: IID.
#> Wald (1st stage), x_endo_2  : stat =   3.25828, p = 0.041268, on 2 and 146 DoF, VCOV: IID.
#>                 Cragg-Donald: 3.11162

To set default printing

# always add second-stage Wald test
setFixest_print(fitstat = ~ . + ivwald2)
est_iv

To see results from different stages

# first-stage
summary(est_iv, stage = 1)

# second-stage
summary(est_iv, stage = 2)

# both stages
etable(summary(est_iv, stage = 1:2), fitstat = ~ . + ivfall + ivwaldall.p)
etable(summary(est_iv, stage = 2:1), fitstat = ~ . + ivfall + ivwaldall.p)
# .p means p-value, not statistic
# `all` means IV only

34.3.2 IV-GMM

The Generalized Method of Moments (GMM) provides a flexible estimation framework that generalizes the Instrumental Variables (IV) approach, including 2SLS as a special case. The key idea behind GMM is to use moment conditions derived from economic models to estimate parameters efficiently, even in the presence of endogeneity.

Consider the standard linear regression model:

Y=Xβ+u,u(0,Ω)

where:

  • Y is an N×1 vector of the dependent variable.
  • X is an N×k matrix of endogenous regressors.
  • β is a k×1 vector of coefficients.
  • u is an N×1 vector of error terms.
  • Ω is the variance-covariance matrix of u.

To address endogeneity in X, we introduce an N×l matrix of instruments, Z, where lk. The moment conditions are then given by:

E[Ziui]=E[Zi(YiXiβ)]=0.

In practice, these expectations are replaced by their sample analogs. The empirical moment conditions are given by:

ˉg(β)=1NNi=1Zi(YiXiβ)=1NZ(YXβ).

GMM estimates β by minimizing a quadratic function of these sample moments.


34.3.2.1 IV and GMM Estimators

  1. Exactly Identified Case (l=k)

When the number of instruments equals the number of endogenous regressors (l=k), the moment conditions uniquely determine β. In this case, the IV estimator is:

ˆβIV=(ZX)1ZY.

This is equivalent to the classical 2SLS estimator.

  1. Overidentified Case (l>k)

When there are more instruments than endogenous variables (l>k), the system has more moment conditions than parameters. In this case, we project X onto the instrument space:

ˆX=Z(ZZ)1ZX=PZX.

The 2SLS estimator is then given by:

ˆβ2SLS=(ˆXX)1ˆXY=(XPZX)1XPZY.

However, 2SLS does not optimally weight the instruments when l>k. The IV-GMM approach resolves this issue.


34.3.2.2 IV-GMM Estimation

The GMM estimator is obtained by minimizing the objective function:

J(ˆβGMM)=Nˉg(ˆβGMM)Wˉg(ˆβGMM),

where W is an l×l symmetric weighting matrix.

For the IV-GMM estimator, solving the first-order conditions yields:

ˆβGMM=(XZWZX)1XZWZY.

For any weighting matrix W, this is a consistent estimator. The optimal choice of W is S1, where S is the covariance matrix of the moment conditions:

S=E[ZuuZ]=lim

A feasible estimator replaces S with its sample estimate from the 2SLS residuals:

\hat{\beta}_{FEGMM} = (X'Z \hat{S}^{-1} Z' X)^{-1} X'Z \hat{S}^{-1} Z'Y.

When \Omega satisfies standard assumptions:

  1. Errors are independently and identically distributed.
  2. S = \sigma_u^2 I_N.
  3. The optimal weighting matrix is proportional to the identity matrix.

Then, the IV-GMM estimator simplifies to the standard IV (or 2SLS) estimator.


Comparison of 2SLS and IV-GMM

Feature 2SLS IV-GMM
Instrument usage Uses a subset of available instruments Uses all available instruments
Weighting No weighting applied Weights instruments for efficiency
Efficiency Suboptimal in overidentified cases Efficient when W = S^{-1}
Overidentification test Not available Uses Hansen’s J-test (overid test)

Key Takeaways:

  • Use IV-GMM whenever overidentification is a concern (i.e., l > k).
  • 2SLS is a special case of IV-GMM when the weighting matrix is proportional to the identity matrix.
  • IV-GMM improves efficiency by optimally weighting the moment conditions.
# Standard approach

library(gmm)
gmm_model <- gmm(y ~ x1, ~ x_inst_1 + x_inst_2, data = base)
summary(gmm_model)
#> 
#> Call:
#> gmm(g = y ~ x1, x = ~x_inst_1 + x_inst_2, data = base)
#> 
#> 
#> Method:  twoStep 
#> 
#> Kernel:  Quadratic Spectral(with bw =  0.72368 )
#> 
#> Coefficients:
#>              Estimate     Std. Error   t value      Pr(>|t|)   
#> (Intercept)   1.4385e+01   1.8960e+00   7.5871e+00   3.2715e-14
#> x1           -2.7506e+00   6.2101e-01  -4.4292e+00   9.4584e-06
#> 
#> J-Test: degrees of freedom is 1 
#>                 J-test     P-value  
#> Test E(g)=0:    7.9455329  0.0048206
#> 
#> Initial values of the coefficients
#> (Intercept)          x1 
#>   16.117875   -3.360622

34.3.2.3 Overidentification Test: Hansen’s J-Statistic

A key advantage of IV-GMM is that it allows testing of instrument validity through the Hansen J-test (also known as the GMM distance test or Hayashi’s C-statistic). The test statistic is:

J = N \bar{g}(\hat{\beta}_{GMM})' \hat{S}^{-1} \bar{g} (\hat{\beta}_{GMM}),

which follows a \chi^2 distribution with degrees of freedom equal to the number of overidentifying restrictions (l - k). A significant J-statistic suggests that the instruments may not be valid.


34.3.2.4 Cluster-Robust Standard Errors

In empirical applications, errors often exhibit heteroskedasticity or intra-group correlation (clustering), violating the assumption of independently and identically distributed errors. Standard IV-GMM estimators remain consistent but may not be efficient if clustering is ignored.

To address this, we adjust the GMM weighting matrix by incorporating cluster-robust variance estimation. Specifically, the covariance matrix of the moment conditions S is estimated as:

\hat{S} = \frac{1}{N} \sum_{c=1}^{C} \left( \sum_{i \in c} Z_i' u_i \right) \left( \sum_{i \in c} Z_i' u_i \right)',

where:

  • C is the number of clusters,

  • i \in c represents observations belonging to cluster c,

  • u_i is the residual for observation i,

  • Z_i is the vector of instruments.

Using this robust weighting matrix, we compute a clustered GMM estimator that remains consistent and improves inference when clustering is present.


# Load required packages
library(gmm)
library(dplyr)
library(MASS)  # For generalized inverse if needed

# General IV-GMM function with clustering
gmmcl <- function(formula, instruments, data, cluster_var, lambda = 1e-6) {
  
  # Ensure cluster_var exists in data
  if (!(cluster_var %in% colnames(data))) {
    stop("Error: Cluster variable not found in data.")
  }
  
  # Step 1: Initial GMM estimation (identity weighting matrix)
  initial_gmm <- gmm(formula, instruments, data = data, vcov = "TrueFixed", 
                      weightsMatrix = diag(ncol(model.matrix(instruments, data))))
  
  # Extract residuals
  u_hat <- residuals(initial_gmm)
  
  # Matrix of instruments
  Z <- model.matrix(instruments, data)
  
  # Ensure clusters are treated as a factor
  data[[cluster_var]] <- as.factor(data[[cluster_var]])
  
  # Compute clustered weighting matrix
  cluster_groups <- split(seq_along(u_hat), data[[cluster_var]])
  
  # Remove empty clusters (if any)
  cluster_groups <- cluster_groups[lengths(cluster_groups) > 0]
  
  # Initialize cluster-based covariance matrix
  S_cluster <- matrix(0, ncol(Z), ncol(Z))  # Zero matrix
  
  # Compute clustered weight matrix
  for (indices in cluster_groups) {
    if (length(indices) > 0) {  # Ensure valid clusters
      u_cluster <- matrix(u_hat[indices], ncol = 1)  # Convert to column matrix
      Z_cluster <- Z[indices, , drop = FALSE]        # Keep matrix form
      S_cluster <- S_cluster + t(Z_cluster) %*% (u_cluster %*% t(u_cluster)) %*% Z_cluster
    }
  }
  
  # Normalize by sample size
  S_cluster <- S_cluster / nrow(data)
  
  # Ensure S_cluster is invertible
  S_cluster <- S_cluster + lambda * diag(ncol(S_cluster))  # Regularization

  # Compute inverse or generalized inverse if needed
  if (qr(S_cluster)$rank < ncol(S_cluster)) {
    S_cluster_inv <- ginv(S_cluster)  # Use generalized inverse (MASS package)
  } else {
    S_cluster_inv <- solve(S_cluster)
  }

  # Step 2: GMM estimation using clustered weighting matrix
  final_gmm <- gmm(formula, instruments, data = data, vcov = "TrueFixed", 
                    weightsMatrix = S_cluster_inv)
  
  return(final_gmm)
}

# Example: Simulated Data for IV-GMM with Clustering
set.seed(123)
n <- 200   # Total observations
C <- 50    # Number of clusters
data <- data.frame(
  cluster = rep(1:C, each = n / C),  # Cluster variable
  z1 = rnorm(n),
  z2 = rnorm(n),
  x1 = rnorm(n),
  y1 = rnorm(n)
)
data$x1 <- data$z1 + data$z2 + rnorm(n)  # Endogenous regressor
data$y1 <- data$x1 + rnorm(n)            # Outcome variable

# Run standard IV-GMM (without clustering)
gmm_results_standard <- gmm(y1 ~ x1, ~ z1 + z2, data = data)

# Run IV-GMM with clustering
gmm_results_clustered <- gmmcl(y1 ~ x1, ~ z1 + z2, data = data, cluster_var = "cluster")

# Display results for comparison
summary(gmm_results_standard)
#> 
#> Call:
#> gmm(g = y1 ~ x1, x = ~z1 + z2, data = data)
#> 
#> 
#> Method:  twoStep 
#> 
#> Kernel:  Quadratic Spectral(with bw =  1.09893 )
#> 
#> Coefficients:
#>              Estimate     Std. Error   t value      Pr(>|t|)   
#> (Intercept)   4.4919e-02   6.5870e-02   6.8193e-01   4.9528e-01
#> x1            9.8409e-01   4.4215e-02   2.2257e+01  9.6467e-110
#> 
#> J-Test: degrees of freedom is 1 
#>                 J-test  P-value
#> Test E(g)=0:    1.6171  0.2035 
#> 
#> Initial values of the coefficients
#> (Intercept)          x1 
#>  0.05138658  0.98580796
summary(gmm_results_clustered)
#> 
#> Call:
#> gmm(g = formula, x = instruments, vcov = "TrueFixed", weightsMatrix = S_cluster_inv, 
#>     data = data)
#> 
#> 
#> Method:  One step GMM with fixed W 
#> 
#> Kernel:  Quadratic Spectral
#> 
#> Coefficients:
#>              Estimate    Std. Error  t value     Pr(>|t|)  
#> (Intercept)  4.9082e-02  7.0878e-05  6.9249e+02  0.0000e+00
#> x1           9.8238e-01  5.2798e-05  1.8606e+04  0.0000e+00
#> 
#> J-Test: degrees of freedom is 1 
#>                 J-test   P-value
#> Test E(g)=0:    1247099        0

34.3.3 Limited Information Maximum Likelihood

LIML is an alternative to 2SLS that performs better when instruments are weak.

It solves: \min_{\lambda} \left| \begin{bmatrix} Y - X\beta \\ \lambda (D - X\gamma) \end{bmatrix} \right| where \lambda is an eigenvalue.

34.3.4 Jackknife IV

JIVE reduces small-sample bias by leaving each observation out when estimating first-stage fitted values:

\begin{aligned} \hat{X}_i^{(-i)} &= Z_i (Z_{-i}'Z_{-i})^{-1} Z_{-i}'X_{-i}. \\ \hat{\beta}_{JIVE} &= (X^{(-i)'}X^{(-i)})^{-1}X^{(-i)'} Y \end{aligned}

library(AER)
jive_model = ivreg(y ~ x_endo_1 | x_inst_1, data = base, method = "jive")
summary(jive_model)
#> 
#> Call:
#> ivreg(formula = y ~ x_endo_1 | x_inst_1, data = base, method = "jive")
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.2390 -0.3022 -0.0206  0.2772  1.0039 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  4.34586    0.08096   53.68   <2e-16 ***
#> x_endo_1     0.39848    0.01964   20.29   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.4075 on 148 degrees of freedom
#> Multiple R-Squared: 0.7595,  Adjusted R-squared: 0.7578 
#> Wald test: 411.6 on 1 and 148 DF,  p-value: < 2.2e-16

34.3.5 Control Function Approach

The Control Function (CF) approach, also known as two-stage residual inclusion (2SRI), is a method used to address endogeneity in regression models. This approach is particularly suited for models with nonadditive errors, such as discrete choice models or cases where both the endogenous variable and the outcome are binary.

The control function approach is particularly useful in:

  • Binary outcome and binary endogenous variable models:
    • In rare events, the second stage typically uses a logistic model (E. Tchetgen Tchetgen 2014).
    • In non-rare events, a risk ratio regression is often more appropriate.
  • Marketing applications:
    • Used in consumer choice models to account for endogeneity in demand estimation (Petrin and Train 2010).

The general model setup is:

Y = g(X) + U

X = \pi(Z) + V

with the key assumptions:

  1. Conditional mean independence:
    E(U |Z,V) = E(U|V)
    This implies that once we control for V, the instrumental variable Z does not directly affect U.

  2. Instrument relevance:
    E(V|Z) = 0
    This ensures that Z is a valid instrument for X.

Under the control function approach, the expectation of Y conditional on (Z,V) can be rewritten as:

E(Y|Z,V) = g(X) + E(U|Z,V) = g(X) + E(U|V) = g(X) + h(V).

Here, h(V) is the control function that captures endogeneity through the first-stage residuals.

34.3.5.1 Implementation

Rather than replacing the endogenous variable X_i with its predicted value \hat{X}_i, the CF approach explicitly incorporates the residuals from the first-stage regression:

Stage 1: Estimate First-Stage Residuals

Estimate the endogenous variable using its instrumental variables:

X_i = Z_i \pi + v_i.

Obtain the residuals:

\hat{v}_i = X_i - Z_i \hat{\pi}.

Stage 2: Include Residuals in Outcome Equation

Regress the outcome variable on X_i and the first-stage residuals:

Y_i = X_i \beta + \gamma \hat{v}_i + \varepsilon_i.

If endogeneity is present, \gamma \neq 0; otherwise, the endogenous regressor X would be exogenous.

34.3.5.2 Comparison to Two-Stage Least Squares

The control function method differs from 2SLS depending on whether the model is linear or nonlinear:

  1. Linear Endogenous Variables:
    • When both X and Y are continuous, the CF approach is equivalent to 2SLS.
  2. Nonlinear Endogenous Variables:
    • If X is nonlinear (e.g., a binary treatment), CF differs from 2SLS and often performs better.
  3. Nonlinear in Parameters:
    • In models where g(X) is nonlinear (e.g., logit/probit models), CF is typically superior to 2SLS because it explicitly models endogeneity via the control function h(V).
library(fixest)
library(tidyverse)
library(modelsummary)

# Set the seed for reproducibility
set.seed(123)
n = 1000
# Generate the exogenous variable from a normal distribution
exogenous <- rnorm(n, mean = 5, sd = 1)

# Generate the omitted variable as a function of the exogenous variable
omitted <- rnorm(n, mean = 2, sd = 1)

# Generate the endogenous variable as a function of the omitted variable and the exogenous variable
endogenous <- 5 * omitted + 2 * exogenous + rnorm(n, mean = 0, sd = 1)

# nonlinear endogenous variable
endogenous_nonlinear <- 5 * omitted^2 + 2 * exogenous + rnorm(100, mean = 0, sd = 1)

unrelated <- rexp(n, rate = 1)

# Generate the response variable as a function of the endogenous variable and the omitted variable
response <- 4 +  3 * endogenous + 6 * omitted + rnorm(n, mean = 0, sd = 1)

response_nonlinear <- 4 +  3 * endogenous_nonlinear + 6 * omitted + rnorm(n, mean = 0, sd = 1)

response_nonlinear_para <- 4 +  3 * endogenous ^ 2 + 6 * omitted + rnorm(n, mean = 0, sd = 1)


# Combine the variables into a data frame
my_data <-
    data.frame(
        exogenous,
        omitted,
        endogenous,
        response,
        unrelated,
        response,
        response_nonlinear,
        response_nonlinear_para
    )

# View the first few rows of the data frame
# head(my_data)

wo_omitted <- feols(response ~ endogenous + sw0(unrelated), data = my_data)
w_omitted  <- feols(response ~ endogenous + omitted + unrelated, data = my_data)


# ivreg::ivreg(response ~ endogenous + unrelated | exogenous, data = my_data)
iv <- feols(response ~ 1 + sw0(unrelated) | endogenous ~ exogenous, data = my_data)

etable(
    wo_omitted,
    w_omitted,
    iv, 
    digits = 2
    # vcov = list("each", "iid", "hetero")
)
#>                   wo_omitted.1   wo_omitted.2     w_omitted          iv.1
#> Dependent Var.:       response       response      response      response
#>                                                                          
#> Constant        -3.8*** (0.30) -3.6*** (0.31) 3.9*** (0.16) 12.0*** (1.4)
#> endogenous       4.0*** (0.01)  4.0*** (0.01) 3.0*** (0.01) 3.2*** (0.07)
#> unrelated                       -0.14. (0.08)  -0.02 (0.03)              
#> omitted                                       6.0*** (0.08)              
#> _______________ ______________ ______________ _____________ _____________
#> S.E. type                  IID            IID           IID           IID
#> Observations             1,000          1,000         1,000         1,000
#> R2                     0.98756        0.98760       0.99817       0.94976
#> Adj. R2                0.98755        0.98757       0.99816       0.94971
#> 
#>                          iv.2
#> Dependent Var.:      response
#>                              
#> Constant        12.2*** (1.4)
#> endogenous      3.2*** (0.07)
#> unrelated       -0.28. (0.16)
#> omitted                      
#> _______________ _____________
#> S.E. type                 IID
#> Observations            1,000
#> R2                    0.95010
#> Adj. R2               0.95000
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear in parameter and linear in endogenous variable

# manual
# 2SLS
first_stage = lm(endogenous ~ exogenous, data = my_data)
new_data = cbind(my_data, new_endogenous = predict(first_stage, my_data))
second_stage = lm(response ~ new_endogenous, data = new_data)
summary(second_stage)
#> 
#> Call:
#> lm(formula = response ~ new_endogenous, data = new_data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -68.126 -14.949   0.608  15.099  73.842 
#> 
#> Coefficients:
#>                Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     11.9910     5.7671   2.079   0.0379 *  
#> new_endogenous   3.2097     0.2832  11.335   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 21.49 on 998 degrees of freedom
#> Multiple R-squared:  0.1141, Adjusted R-squared:  0.1132 
#> F-statistic: 128.5 on 1 and 998 DF,  p-value: < 2.2e-16

new_data_cf = cbind(my_data, residual = resid(first_stage))
second_stage_cf = lm(response ~ endogenous + residual, data = new_data_cf)
summary(second_stage_cf)
#> 
#> Call:
#> lm(formula = response ~ endogenous + residual, data = new_data_cf)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.1039 -1.0065  0.0247  0.9480  4.2521 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 11.99102    0.39849   30.09   <2e-16 ***
#> endogenous   3.20974    0.01957  164.05   <2e-16 ***
#> residual     0.95036    0.02159   44.02   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.485 on 997 degrees of freedom
#> Multiple R-squared:  0.9958, Adjusted R-squared:  0.9958 
#> F-statistic: 1.175e+05 on 2 and 997 DF,  p-value: < 2.2e-16

modelsummary(list(second_stage, second_stage_cf))
(1) (2)
(Intercept) 11.991 11.991
(5.767) (0.398)
new_endogenous 3.210
(0.283)
endogenous 3.210
(0.020)
residual 0.950
(0.022)
Num.Obs. 1000 1000
R2 0.114 0.996
R2 Adj. 0.113 0.996
AIC 8977.0 3633.5
BIC 8991.8 3653.2
Log.Lik. -4485.516 -1812.768
F 128.483 117473.460
RMSE 21.47 1.48

Nonlinear in endogenous variable

# 2SLS
first_stage = lm(endogenous_nonlinear ~ exogenous, data = my_data)

new_data = cbind(my_data, new_endogenous_nonlinear = predict(first_stage, my_data))
second_stage = lm(response_nonlinear ~ new_endogenous_nonlinear, data = new_data)
summary(second_stage)
#> 
#> Call:
#> lm(formula = response_nonlinear ~ new_endogenous_nonlinear, data = new_data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -101.26  -53.01  -13.50   39.33  376.16 
#> 
#> Coefficients:
#>                          Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               11.7539    21.6478   0.543    0.587    
#> new_endogenous_nonlinear   3.1253     0.5993   5.215 2.23e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 70.89 on 998 degrees of freedom
#> Multiple R-squared:  0.02653,    Adjusted R-squared:  0.02555 
#> F-statistic:  27.2 on 1 and 998 DF,  p-value: 2.234e-07

new_data_cf = cbind(my_data, residual = resid(first_stage))
second_stage_cf = lm(response_nonlinear ~ endogenous_nonlinear + residual, data = new_data_cf)
summary(second_stage_cf)
#> 
#> Call:
#> lm(formula = response_nonlinear ~ endogenous_nonlinear + residual, 
#>     data = new_data_cf)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -12.8559  -0.8337   0.4429   1.3432   4.3147 
#> 
#> Coefficients:
#>                      Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)          11.75395    0.67012  17.540  < 2e-16 ***
#> endogenous_nonlinear  3.12525    0.01855 168.469  < 2e-16 ***
#> residual              0.13577    0.01882   7.213 1.08e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.194 on 997 degrees of freedom
#> Multiple R-squared:  0.9991, Adjusted R-squared:  0.9991 
#> F-statistic: 5.344e+05 on 2 and 997 DF,  p-value: < 2.2e-16

modelsummary(list(second_stage, second_stage_cf))
(1) (2)
(Intercept) 11.754 11.754
(21.648) (0.670)
new_endogenous_nonlinear 3.125
(0.599)
endogenous_nonlinear 3.125
(0.019)
residual 0.136
(0.019)
Num.Obs. 1000 1000
R2 0.027 0.999
R2 Adj. 0.026 0.999
AIC 11364.2 4414.7
BIC 11378.9 4434.4
Log.Lik. -5679.079 -2203.371
F 27.196 534439.006
RMSE 70.82 2.19

Nonlinear in parameters

# 2SLS
first_stage = lm(endogenous ~ exogenous, data = my_data)

new_data = cbind(my_data, new_endogenous = predict(first_stage, my_data))
second_stage = lm(response_nonlinear_para ~ new_endogenous, data = new_data)
summary(second_stage)
#> 
#> Call:
#> lm(formula = response_nonlinear_para ~ new_endogenous, data = new_data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1402.34  -462.21   -64.22   382.35  3090.62 
#> 
#> Coefficients:
#>                 Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    -1137.875    173.811  -6.547  9.4e-11 ***
#> new_endogenous   122.525      8.534  14.357  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 647.7 on 998 degrees of freedom
#> Multiple R-squared:  0.1712, Adjusted R-squared:  0.1704 
#> F-statistic: 206.1 on 1 and 998 DF,  p-value: < 2.2e-16

new_data_cf = cbind(my_data, residual = resid(first_stage))
second_stage_cf = lm(response_nonlinear_para ~ endogenous_nonlinear + residual, data = new_data_cf)
summary(second_stage_cf)
#> 
#> Call:
#> lm(formula = response_nonlinear_para ~ endogenous_nonlinear + 
#>     residual, data = new_data_cf)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -904.77 -154.35  -20.41  143.24  953.04 
#> 
#> Coefficients:
#>                      Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)          492.2494    32.3530   15.21  < 2e-16 ***
#> endogenous_nonlinear  23.5991     0.8741   27.00  < 2e-16 ***
#> residual              30.5914     3.7397    8.18 8.58e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 245.9 on 997 degrees of freedom
#> Multiple R-squared:  0.8806, Adjusted R-squared:  0.8804 
#> F-statistic:  3676 on 2 and 997 DF,  p-value: < 2.2e-16

modelsummary(list(second_stage, second_stage_cf))
(1) (2)
(Intercept) -1137.875 492.249
(173.811) (32.353)
new_endogenous 122.525
(8.534)
endogenous_nonlinear 23.599
(0.874)
residual 30.591
(3.740)
Num.Obs. 1000 1000
R2 0.171 0.881
R2 Adj. 0.170 0.880
AIC 15788.6 13853.1
BIC 15803.3 13872.7
Log.Lik. -7891.307 -6922.553
F 206.123 3676.480
RMSE 647.01 245.58

34.3.6 Fuller and Bias-Reduced IV

Fuller adjusts LIML for bias reduction.

fuller_model = ivreg(y ~ x_endo_1 | x_inst_1, data = base, method = "fuller", k = 1)
summary(fuller_model)
#> 
#> Call:
#> ivreg(formula = y ~ x_endo_1 | x_inst_1, data = base, method = "fuller", 
#>     k = 1)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.2390 -0.3022 -0.0206  0.2772  1.0039 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  4.34586    0.08096   53.68   <2e-16 ***
#> x_endo_1     0.39848    0.01964   20.29   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.4075 on 148 degrees of freedom
#> Multiple R-Squared: 0.7595,  Adjusted R-squared: 0.7578 
#> Wald test: 411.6 on 1 and 148 DF,  p-value: < 2.2e-16

References

Petrin, Amil, and Kenneth Train. 2010. “A Control Function Approach to Endogeneity in Consumer Choice Models.” Journal of Marketing Research 47 (1): 3–13.
Tchetgen Tchetgen, Eric. 2014. “A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome.” Epidemiologic Methods 3 (1): 107–12.