34.3 Estimation
34.3.1 Two-Stage Least Squares Estimation
Two-Stage Least Squares (2SLS) is the most widely used IV estimator It’s a special case of IV-GMM. Consider the structural equation:
Yi=Xiβ+εi,
where Xi is endogenous. We introduce an instrument Zi satisfying:
- Relevance: Zi is correlated with Xi.
- Exogeneity: Zi is uncorrelated with εi.
2SLS Steps
First-Stage Regression: Predict Xi using the instrument: Xi=π0+π1Zi+vi.
- Obtain fitted values ˆXi=π0+π1Zi.
Second-Stage Regression: Use ˆXi in place of Xi: Yi=β0+β1ˆXi+εi.
- The estimated ˆβ1 is our IV estimator.
library(fixest)
base = iris
names(base) = c("y", "x1", "x_endo_1", "x_inst_1", "fe")
set.seed(2)
base$x_inst_2 = 0.2 * base$y + 0.2 * base$x_endo_1 + rnorm(150, sd = 0.5)
base$x_endo_2 = 0.2 * base$y - 0.2 * base$x_inst_1 + rnorm(150, sd = 0.5)
# IV Estimation
est_iv = feols(y ~ x1 | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, base)
summary(est_iv)
#> TSLS estimation - Dep. Var.: y
#> Endo. : x_endo_1, x_endo_2
#> Instr. : x_inst_1, x_inst_2
#> Second stage: Dep. Var.: y
#> Observations: 150
#> Standard-errors: IID
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.831380 0.411435 4.45121 1.6844e-05 ***
#> fit_x_endo_1 0.444982 0.022086 20.14744 < 2.2e-16 ***
#> fit_x_endo_2 0.639916 0.307376 2.08186 3.9100e-02 *
#> x1 0.565095 0.084715 6.67051 4.9180e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.398842 Adj. R2: 0.761653
#> F-test (1st stage), x_endo_1: stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat = 3.25828, p = 0.041268, on 2 and 146 DoF.
#> Wu-Hausman: stat = 6.79183, p = 0.001518, on 2 and 144 DoF.
Diagnostic Tests
To assess instrument validity:
fitstat(est_iv, type = c("n", "f", "ivf", "ivf1", "ivf2", "ivwald", "cd"))
#> Observations: 150
#> F-test: stat = 132.0 , p < 2.2e-16 , on 3 and 146 DoF.
#> F-test (1st stage), x_endo_1: stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF.
#> F-test (1st stage), x_endo_2: stat = 3.25828, p = 0.041268, on 2 and 146 DoF.
#> F-test (2nd stage): stat = 194.2 , p < 2.2e-16 , on 2 and 146 DoF.
#> Wald (1st stage), x_endo_1 : stat = 903.2 , p < 2.2e-16 , on 2 and 146 DoF, VCOV: IID.
#> Wald (1st stage), x_endo_2 : stat = 3.25828, p = 0.041268, on 2 and 146 DoF, VCOV: IID.
#> Cragg-Donald: 3.11162
To set default printing
To see results from different stages
# first-stage
summary(est_iv, stage = 1)
# second-stage
summary(est_iv, stage = 2)
# both stages
etable(summary(est_iv, stage = 1:2), fitstat = ~ . + ivfall + ivwaldall.p)
etable(summary(est_iv, stage = 2:1), fitstat = ~ . + ivfall + ivwaldall.p)
# .p means p-value, not statistic
# `all` means IV only
34.3.2 IV-GMM
The Generalized Method of Moments (GMM) provides a flexible estimation framework that generalizes the Instrumental Variables (IV) approach, including 2SLS as a special case. The key idea behind GMM is to use moment conditions derived from economic models to estimate parameters efficiently, even in the presence of endogeneity.
Consider the standard linear regression model:
Y=Xβ+u,u∼(0,Ω)
where:
- Y is an N×1 vector of the dependent variable.
- X is an N×k matrix of endogenous regressors.
- β is a k×1 vector of coefficients.
- u is an N×1 vector of error terms.
- Ω is the variance-covariance matrix of u.
To address endogeneity in X, we introduce an N×l matrix of instruments, Z, where l≥k. The moment conditions are then given by:
E[Z′iui]=E[Z′i(Yi−Xiβ)]=0.
In practice, these expectations are replaced by their sample analogs. The empirical moment conditions are given by:
ˉg(β)=1NN∑i=1Z′i(Yi−Xiβ)=1NZ′(Y−Xβ).
GMM estimates β by minimizing a quadratic function of these sample moments.
34.3.2.1 IV and GMM Estimators
- Exactly Identified Case (l=k)
When the number of instruments equals the number of endogenous regressors (l=k), the moment conditions uniquely determine β. In this case, the IV estimator is:
ˆβIV=(Z′X)−1Z′Y.
This is equivalent to the classical 2SLS estimator.
- Overidentified Case (l>k)
When there are more instruments than endogenous variables (l>k), the system has more moment conditions than parameters. In this case, we project X onto the instrument space:
ˆX=Z(Z′Z)−1Z′X=PZX.
The 2SLS estimator is then given by:
ˆβ2SLS=(ˆX′X)−1ˆX′Y=(X′PZX)−1X′PZY.
However, 2SLS does not optimally weight the instruments when l>k. The IV-GMM approach resolves this issue.
34.3.2.2 IV-GMM Estimation
The GMM estimator is obtained by minimizing the objective function:
J(ˆβGMM)=Nˉg(ˆβGMM)′Wˉg(ˆβGMM),
where W is an l×l symmetric weighting matrix.
For the IV-GMM estimator, solving the first-order conditions yields:
ˆβGMM=(X′ZWZ′X)−1X′ZWZ′Y.
For any weighting matrix W, this is a consistent estimator. The optimal choice of W is S−1, where S is the covariance matrix of the moment conditions:
S=E[Z′uu′Z]=lim
A feasible estimator replaces S with its sample estimate from the 2SLS residuals:
\hat{\beta}_{FEGMM} = (X'Z \hat{S}^{-1} Z' X)^{-1} X'Z \hat{S}^{-1} Z'Y.
When \Omega satisfies standard assumptions:
- Errors are independently and identically distributed.
- S = \sigma_u^2 I_N.
- The optimal weighting matrix is proportional to the identity matrix.
Then, the IV-GMM estimator simplifies to the standard IV (or 2SLS) estimator.
Comparison of 2SLS and IV-GMM
Feature | 2SLS | IV-GMM |
---|---|---|
Instrument usage | Uses a subset of available instruments | Uses all available instruments |
Weighting | No weighting applied | Weights instruments for efficiency |
Efficiency | Suboptimal in overidentified cases | Efficient when W = S^{-1} |
Overidentification test | Not available | Uses Hansen’s J-test (overid test) |
Key Takeaways:
- Use IV-GMM whenever overidentification is a concern (i.e., l > k).
- 2SLS is a special case of IV-GMM when the weighting matrix is proportional to the identity matrix.
- IV-GMM improves efficiency by optimally weighting the moment conditions.
# Standard approach
library(gmm)
gmm_model <- gmm(y ~ x1, ~ x_inst_1 + x_inst_2, data = base)
summary(gmm_model)
#>
#> Call:
#> gmm(g = y ~ x1, x = ~x_inst_1 + x_inst_2, data = base)
#>
#>
#> Method: twoStep
#>
#> Kernel: Quadratic Spectral(with bw = 0.72368 )
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.4385e+01 1.8960e+00 7.5871e+00 3.2715e-14
#> x1 -2.7506e+00 6.2101e-01 -4.4292e+00 9.4584e-06
#>
#> J-Test: degrees of freedom is 1
#> J-test P-value
#> Test E(g)=0: 7.9455329 0.0048206
#>
#> Initial values of the coefficients
#> (Intercept) x1
#> 16.117875 -3.360622
34.3.2.3 Overidentification Test: Hansen’s J-Statistic
A key advantage of IV-GMM is that it allows testing of instrument validity through the Hansen J-test (also known as the GMM distance test or Hayashi’s C-statistic). The test statistic is:
J = N \bar{g}(\hat{\beta}_{GMM})' \hat{S}^{-1} \bar{g} (\hat{\beta}_{GMM}),
which follows a \chi^2 distribution with degrees of freedom equal to the number of overidentifying restrictions (l - k). A significant J-statistic suggests that the instruments may not be valid.
34.3.2.4 Cluster-Robust Standard Errors
In empirical applications, errors often exhibit heteroskedasticity or intra-group correlation (clustering), violating the assumption of independently and identically distributed errors. Standard IV-GMM estimators remain consistent but may not be efficient if clustering is ignored.
To address this, we adjust the GMM weighting matrix by incorporating cluster-robust variance estimation. Specifically, the covariance matrix of the moment conditions S is estimated as:
\hat{S} = \frac{1}{N} \sum_{c=1}^{C} \left( \sum_{i \in c} Z_i' u_i \right) \left( \sum_{i \in c} Z_i' u_i \right)',
where:
C is the number of clusters,
i \in c represents observations belonging to cluster c,
u_i is the residual for observation i,
Z_i is the vector of instruments.
Using this robust weighting matrix, we compute a clustered GMM estimator that remains consistent and improves inference when clustering is present.
# Load required packages
library(gmm)
library(dplyr)
library(MASS) # For generalized inverse if needed
# General IV-GMM function with clustering
gmmcl <- function(formula, instruments, data, cluster_var, lambda = 1e-6) {
# Ensure cluster_var exists in data
if (!(cluster_var %in% colnames(data))) {
stop("Error: Cluster variable not found in data.")
}
# Step 1: Initial GMM estimation (identity weighting matrix)
initial_gmm <- gmm(formula, instruments, data = data, vcov = "TrueFixed",
weightsMatrix = diag(ncol(model.matrix(instruments, data))))
# Extract residuals
u_hat <- residuals(initial_gmm)
# Matrix of instruments
Z <- model.matrix(instruments, data)
# Ensure clusters are treated as a factor
data[[cluster_var]] <- as.factor(data[[cluster_var]])
# Compute clustered weighting matrix
cluster_groups <- split(seq_along(u_hat), data[[cluster_var]])
# Remove empty clusters (if any)
cluster_groups <- cluster_groups[lengths(cluster_groups) > 0]
# Initialize cluster-based covariance matrix
S_cluster <- matrix(0, ncol(Z), ncol(Z)) # Zero matrix
# Compute clustered weight matrix
for (indices in cluster_groups) {
if (length(indices) > 0) { # Ensure valid clusters
u_cluster <- matrix(u_hat[indices], ncol = 1) # Convert to column matrix
Z_cluster <- Z[indices, , drop = FALSE] # Keep matrix form
S_cluster <- S_cluster + t(Z_cluster) %*% (u_cluster %*% t(u_cluster)) %*% Z_cluster
}
}
# Normalize by sample size
S_cluster <- S_cluster / nrow(data)
# Ensure S_cluster is invertible
S_cluster <- S_cluster + lambda * diag(ncol(S_cluster)) # Regularization
# Compute inverse or generalized inverse if needed
if (qr(S_cluster)$rank < ncol(S_cluster)) {
S_cluster_inv <- ginv(S_cluster) # Use generalized inverse (MASS package)
} else {
S_cluster_inv <- solve(S_cluster)
}
# Step 2: GMM estimation using clustered weighting matrix
final_gmm <- gmm(formula, instruments, data = data, vcov = "TrueFixed",
weightsMatrix = S_cluster_inv)
return(final_gmm)
}
# Example: Simulated Data for IV-GMM with Clustering
set.seed(123)
n <- 200 # Total observations
C <- 50 # Number of clusters
data <- data.frame(
cluster = rep(1:C, each = n / C), # Cluster variable
z1 = rnorm(n),
z2 = rnorm(n),
x1 = rnorm(n),
y1 = rnorm(n)
)
data$x1 <- data$z1 + data$z2 + rnorm(n) # Endogenous regressor
data$y1 <- data$x1 + rnorm(n) # Outcome variable
# Run standard IV-GMM (without clustering)
gmm_results_standard <- gmm(y1 ~ x1, ~ z1 + z2, data = data)
# Run IV-GMM with clustering
gmm_results_clustered <- gmmcl(y1 ~ x1, ~ z1 + z2, data = data, cluster_var = "cluster")
# Display results for comparison
summary(gmm_results_standard)
#>
#> Call:
#> gmm(g = y1 ~ x1, x = ~z1 + z2, data = data)
#>
#>
#> Method: twoStep
#>
#> Kernel: Quadratic Spectral(with bw = 1.09893 )
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.4919e-02 6.5870e-02 6.8193e-01 4.9528e-01
#> x1 9.8409e-01 4.4215e-02 2.2257e+01 9.6467e-110
#>
#> J-Test: degrees of freedom is 1
#> J-test P-value
#> Test E(g)=0: 1.6171 0.2035
#>
#> Initial values of the coefficients
#> (Intercept) x1
#> 0.05138658 0.98580796
summary(gmm_results_clustered)
#>
#> Call:
#> gmm(g = formula, x = instruments, vcov = "TrueFixed", weightsMatrix = S_cluster_inv,
#> data = data)
#>
#>
#> Method: One step GMM with fixed W
#>
#> Kernel: Quadratic Spectral
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.9082e-02 7.0878e-05 6.9249e+02 0.0000e+00
#> x1 9.8238e-01 5.2798e-05 1.8606e+04 0.0000e+00
#>
#> J-Test: degrees of freedom is 1
#> J-test P-value
#> Test E(g)=0: 1247099 0
34.3.3 Limited Information Maximum Likelihood
LIML is an alternative to 2SLS that performs better when instruments are weak.
It solves: \min_{\lambda} \left| \begin{bmatrix} Y - X\beta \\ \lambda (D - X\gamma) \end{bmatrix} \right| where \lambda is an eigenvalue.
34.3.4 Jackknife IV
JIVE reduces small-sample bias by leaving each observation out when estimating first-stage fitted values:
\begin{aligned} \hat{X}_i^{(-i)} &= Z_i (Z_{-i}'Z_{-i})^{-1} Z_{-i}'X_{-i}. \\ \hat{\beta}_{JIVE} &= (X^{(-i)'}X^{(-i)})^{-1}X^{(-i)'} Y \end{aligned}
library(AER)
jive_model = ivreg(y ~ x_endo_1 | x_inst_1, data = base, method = "jive")
summary(jive_model)
#>
#> Call:
#> ivreg(formula = y ~ x_endo_1 | x_inst_1, data = base, method = "jive")
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.2390 -0.3022 -0.0206 0.2772 1.0039
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.34586 0.08096 53.68 <2e-16 ***
#> x_endo_1 0.39848 0.01964 20.29 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.4075 on 148 degrees of freedom
#> Multiple R-Squared: 0.7595, Adjusted R-squared: 0.7578
#> Wald test: 411.6 on 1 and 148 DF, p-value: < 2.2e-16
34.3.5 Control Function Approach
The Control Function (CF) approach, also known as two-stage residual inclusion (2SRI), is a method used to address endogeneity in regression models. This approach is particularly suited for models with nonadditive errors, such as discrete choice models or cases where both the endogenous variable and the outcome are binary.
The control function approach is particularly useful in:
- Binary outcome and binary endogenous variable models:
- In rare events, the second stage typically uses a logistic model (E. Tchetgen Tchetgen 2014).
- In non-rare events, a risk ratio regression is often more appropriate.
- Marketing applications:
- Used in consumer choice models to account for endogeneity in demand estimation (Petrin and Train 2010).
The general model setup is:
Y = g(X) + U
X = \pi(Z) + V
with the key assumptions:
Conditional mean independence:
E(U |Z,V) = E(U|V)
This implies that once we control for V, the instrumental variable Z does not directly affect U.Instrument relevance:
E(V|Z) = 0
This ensures that Z is a valid instrument for X.
Under the control function approach, the expectation of Y conditional on (Z,V) can be rewritten as:
E(Y|Z,V) = g(X) + E(U|Z,V) = g(X) + E(U|V) = g(X) + h(V).
Here, h(V) is the control function that captures endogeneity through the first-stage residuals.
34.3.5.1 Implementation
Rather than replacing the endogenous variable X_i with its predicted value \hat{X}_i, the CF approach explicitly incorporates the residuals from the first-stage regression:
Stage 1: Estimate First-Stage Residuals
Estimate the endogenous variable using its instrumental variables:
X_i = Z_i \pi + v_i.
Obtain the residuals:
\hat{v}_i = X_i - Z_i \hat{\pi}.
Stage 2: Include Residuals in Outcome Equation
Regress the outcome variable on X_i and the first-stage residuals:
Y_i = X_i \beta + \gamma \hat{v}_i + \varepsilon_i.
If endogeneity is present, \gamma \neq 0; otherwise, the endogenous regressor X would be exogenous.
34.3.5.2 Comparison to Two-Stage Least Squares
The control function method differs from 2SLS depending on whether the model is linear or nonlinear:
- Linear Endogenous Variables:
- When both X and Y are continuous, the CF approach is equivalent to 2SLS.
- Nonlinear Endogenous Variables:
- If X is nonlinear (e.g., a binary treatment), CF differs from 2SLS and often performs better.
- Nonlinear in Parameters:
- In models where g(X) is nonlinear (e.g., logit/probit models), CF is typically superior to 2SLS because it explicitly models endogeneity via the control function h(V).
library(fixest)
library(tidyverse)
library(modelsummary)
# Set the seed for reproducibility
set.seed(123)
n = 1000
# Generate the exogenous variable from a normal distribution
exogenous <- rnorm(n, mean = 5, sd = 1)
# Generate the omitted variable as a function of the exogenous variable
omitted <- rnorm(n, mean = 2, sd = 1)
# Generate the endogenous variable as a function of the omitted variable and the exogenous variable
endogenous <- 5 * omitted + 2 * exogenous + rnorm(n, mean = 0, sd = 1)
# nonlinear endogenous variable
endogenous_nonlinear <- 5 * omitted^2 + 2 * exogenous + rnorm(100, mean = 0, sd = 1)
unrelated <- rexp(n, rate = 1)
# Generate the response variable as a function of the endogenous variable and the omitted variable
response <- 4 + 3 * endogenous + 6 * omitted + rnorm(n, mean = 0, sd = 1)
response_nonlinear <- 4 + 3 * endogenous_nonlinear + 6 * omitted + rnorm(n, mean = 0, sd = 1)
response_nonlinear_para <- 4 + 3 * endogenous ^ 2 + 6 * omitted + rnorm(n, mean = 0, sd = 1)
# Combine the variables into a data frame
my_data <-
data.frame(
exogenous,
omitted,
endogenous,
response,
unrelated,
response,
response_nonlinear,
response_nonlinear_para
)
# View the first few rows of the data frame
# head(my_data)
wo_omitted <- feols(response ~ endogenous + sw0(unrelated), data = my_data)
w_omitted <- feols(response ~ endogenous + omitted + unrelated, data = my_data)
# ivreg::ivreg(response ~ endogenous + unrelated | exogenous, data = my_data)
iv <- feols(response ~ 1 + sw0(unrelated) | endogenous ~ exogenous, data = my_data)
etable(
wo_omitted,
w_omitted,
iv,
digits = 2
# vcov = list("each", "iid", "hetero")
)
#> wo_omitted.1 wo_omitted.2 w_omitted iv.1
#> Dependent Var.: response response response response
#>
#> Constant -3.8*** (0.30) -3.6*** (0.31) 3.9*** (0.16) 12.0*** (1.4)
#> endogenous 4.0*** (0.01) 4.0*** (0.01) 3.0*** (0.01) 3.2*** (0.07)
#> unrelated -0.14. (0.08) -0.02 (0.03)
#> omitted 6.0*** (0.08)
#> _______________ ______________ ______________ _____________ _____________
#> S.E. type IID IID IID IID
#> Observations 1,000 1,000 1,000 1,000
#> R2 0.98756 0.98760 0.99817 0.94976
#> Adj. R2 0.98755 0.98757 0.99816 0.94971
#>
#> iv.2
#> Dependent Var.: response
#>
#> Constant 12.2*** (1.4)
#> endogenous 3.2*** (0.07)
#> unrelated -0.28. (0.16)
#> omitted
#> _______________ _____________
#> S.E. type IID
#> Observations 1,000
#> R2 0.95010
#> Adj. R2 0.95000
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear in parameter and linear in endogenous variable
# manual
# 2SLS
first_stage = lm(endogenous ~ exogenous, data = my_data)
new_data = cbind(my_data, new_endogenous = predict(first_stage, my_data))
second_stage = lm(response ~ new_endogenous, data = new_data)
summary(second_stage)
#>
#> Call:
#> lm(formula = response ~ new_endogenous, data = new_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -68.126 -14.949 0.608 15.099 73.842
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 11.9910 5.7671 2.079 0.0379 *
#> new_endogenous 3.2097 0.2832 11.335 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 21.49 on 998 degrees of freedom
#> Multiple R-squared: 0.1141, Adjusted R-squared: 0.1132
#> F-statistic: 128.5 on 1 and 998 DF, p-value: < 2.2e-16
new_data_cf = cbind(my_data, residual = resid(first_stage))
second_stage_cf = lm(response ~ endogenous + residual, data = new_data_cf)
summary(second_stage_cf)
#>
#> Call:
#> lm(formula = response ~ endogenous + residual, data = new_data_cf)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.1039 -1.0065 0.0247 0.9480 4.2521
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 11.99102 0.39849 30.09 <2e-16 ***
#> endogenous 3.20974 0.01957 164.05 <2e-16 ***
#> residual 0.95036 0.02159 44.02 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.485 on 997 degrees of freedom
#> Multiple R-squared: 0.9958, Adjusted R-squared: 0.9958
#> F-statistic: 1.175e+05 on 2 and 997 DF, p-value: < 2.2e-16
modelsummary(list(second_stage, second_stage_cf))
(1) | (2) | |
---|---|---|
(Intercept) | 11.991 | 11.991 |
(5.767) | (0.398) | |
new_endogenous | 3.210 | |
(0.283) | ||
endogenous | 3.210 | |
(0.020) | ||
residual | 0.950 | |
(0.022) | ||
Num.Obs. | 1000 | 1000 |
R2 | 0.114 | 0.996 |
R2 Adj. | 0.113 | 0.996 |
AIC | 8977.0 | 3633.5 |
BIC | 8991.8 | 3653.2 |
Log.Lik. | -4485.516 | -1812.768 |
F | 128.483 | 117473.460 |
RMSE | 21.47 | 1.48 |
Nonlinear in endogenous variable
# 2SLS
first_stage = lm(endogenous_nonlinear ~ exogenous, data = my_data)
new_data = cbind(my_data, new_endogenous_nonlinear = predict(first_stage, my_data))
second_stage = lm(response_nonlinear ~ new_endogenous_nonlinear, data = new_data)
summary(second_stage)
#>
#> Call:
#> lm(formula = response_nonlinear ~ new_endogenous_nonlinear, data = new_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -101.26 -53.01 -13.50 39.33 376.16
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 11.7539 21.6478 0.543 0.587
#> new_endogenous_nonlinear 3.1253 0.5993 5.215 2.23e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 70.89 on 998 degrees of freedom
#> Multiple R-squared: 0.02653, Adjusted R-squared: 0.02555
#> F-statistic: 27.2 on 1 and 998 DF, p-value: 2.234e-07
new_data_cf = cbind(my_data, residual = resid(first_stage))
second_stage_cf = lm(response_nonlinear ~ endogenous_nonlinear + residual, data = new_data_cf)
summary(second_stage_cf)
#>
#> Call:
#> lm(formula = response_nonlinear ~ endogenous_nonlinear + residual,
#> data = new_data_cf)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -12.8559 -0.8337 0.4429 1.3432 4.3147
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 11.75395 0.67012 17.540 < 2e-16 ***
#> endogenous_nonlinear 3.12525 0.01855 168.469 < 2e-16 ***
#> residual 0.13577 0.01882 7.213 1.08e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.194 on 997 degrees of freedom
#> Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991
#> F-statistic: 5.344e+05 on 2 and 997 DF, p-value: < 2.2e-16
modelsummary(list(second_stage, second_stage_cf))
(1) | (2) | |
---|---|---|
(Intercept) | 11.754 | 11.754 |
(21.648) | (0.670) | |
new_endogenous_nonlinear | 3.125 | |
(0.599) | ||
endogenous_nonlinear | 3.125 | |
(0.019) | ||
residual | 0.136 | |
(0.019) | ||
Num.Obs. | 1000 | 1000 |
R2 | 0.027 | 0.999 |
R2 Adj. | 0.026 | 0.999 |
AIC | 11364.2 | 4414.7 |
BIC | 11378.9 | 4434.4 |
Log.Lik. | -5679.079 | -2203.371 |
F | 27.196 | 534439.006 |
RMSE | 70.82 | 2.19 |
Nonlinear in parameters
# 2SLS
first_stage = lm(endogenous ~ exogenous, data = my_data)
new_data = cbind(my_data, new_endogenous = predict(first_stage, my_data))
second_stage = lm(response_nonlinear_para ~ new_endogenous, data = new_data)
summary(second_stage)
#>
#> Call:
#> lm(formula = response_nonlinear_para ~ new_endogenous, data = new_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1402.34 -462.21 -64.22 382.35 3090.62
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1137.875 173.811 -6.547 9.4e-11 ***
#> new_endogenous 122.525 8.534 14.357 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 647.7 on 998 degrees of freedom
#> Multiple R-squared: 0.1712, Adjusted R-squared: 0.1704
#> F-statistic: 206.1 on 1 and 998 DF, p-value: < 2.2e-16
new_data_cf = cbind(my_data, residual = resid(first_stage))
second_stage_cf = lm(response_nonlinear_para ~ endogenous_nonlinear + residual, data = new_data_cf)
summary(second_stage_cf)
#>
#> Call:
#> lm(formula = response_nonlinear_para ~ endogenous_nonlinear +
#> residual, data = new_data_cf)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -904.77 -154.35 -20.41 143.24 953.04
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 492.2494 32.3530 15.21 < 2e-16 ***
#> endogenous_nonlinear 23.5991 0.8741 27.00 < 2e-16 ***
#> residual 30.5914 3.7397 8.18 8.58e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 245.9 on 997 degrees of freedom
#> Multiple R-squared: 0.8806, Adjusted R-squared: 0.8804
#> F-statistic: 3676 on 2 and 997 DF, p-value: < 2.2e-16
modelsummary(list(second_stage, second_stage_cf))
(1) | (2) | |
---|---|---|
(Intercept) | -1137.875 | 492.249 |
(173.811) | (32.353) | |
new_endogenous | 122.525 | |
(8.534) | ||
endogenous_nonlinear | 23.599 | |
(0.874) | ||
residual | 30.591 | |
(3.740) | ||
Num.Obs. | 1000 | 1000 |
R2 | 0.171 | 0.881 |
R2 Adj. | 0.170 | 0.880 |
AIC | 15788.6 | 13853.1 |
BIC | 15803.3 | 13872.7 |
Log.Lik. | -7891.307 | -6922.553 |
F | 206.123 | 3676.480 |
RMSE | 647.01 | 245.58 |
34.3.6 Fuller and Bias-Reduced IV
Fuller adjusts LIML for bias reduction.
fuller_model = ivreg(y ~ x_endo_1 | x_inst_1, data = base, method = "fuller", k = 1)
summary(fuller_model)
#>
#> Call:
#> ivreg(formula = y ~ x_endo_1 | x_inst_1, data = base, method = "fuller",
#> k = 1)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.2390 -0.3022 -0.0206 0.2772 1.0039
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.34586 0.08096 53.68 <2e-16 ***
#> x_endo_1 0.39848 0.01964 20.29 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.4075 on 148 degrees of freedom
#> Multiple R-Squared: 0.7595, Adjusted R-squared: 0.7578
#> Wald test: 411.6 on 1 and 148 DF, p-value: < 2.2e-16