## 30.4 Testing Assumptions

\[ Y = \beta_1 X_1 + \beta_2 X_2 + \epsilon \]

where

\(X_1\) are exogenous variables

\(X_2\) are endogenous variables

\(Z\) are instrumental variables

If \(Z\) satisfies the relevance condition, it means \(Cov(Z, X_2) \neq 0\)

This is important because we need this to be able to estimate \(\beta_2\) where

\[ \beta_2 = \frac{Cov(Z,Y)}{Cov(Z, X_2)} \]

If \(Z\) satisfies the exogeneity condition, \(E[Z\epsilon]=0\), this can achieve by

\(Z\) having no direct effect on \(Y\) except through \(X_2\)

In the presence of omitted variable, \(Z\) is uncorrelated with this variable.

If we just want to know the effect of \(Z\) on \(Y\) (**reduced form**) where the coefficient of \(Z\) is

\[ \rho = \frac{Cov(Y, Z)}{Var(Z)} \]

and this effect is only through \(X_2\) (by the exclusion restriction assumption).

We can also consistently estimate the effect of \(Z\) on \(X\) (**first stage**) where the the coefficient of \(X_2\) is

\[ \pi = \frac{Cov(X_2, Z)}{Var(Z)} \]

and the IV estimate is

\[ \beta_2 = \frac{Cov(Y,Z)}{Cov(X_2, Z)} = \frac{\rho}{\pi} \]

### 30.4.1 Relevance Assumption

**Weak instruments**: can explain little variation in the endogenous regressor- Coefficient estimate of the endogenous variable will be inaccurate.
- For cases where weak instruments are unavoidable, M. J. Moreira (2003) proposes the conditional likelihood ratio test for robust inference. This test is considered approximately optimal for weak instrument scenarios (D. W. Andrews, Moreira, and Stock 2008; D. W. Andrews and Marmer 2008).

Rule of thumb:

Compute F-statistic in the first-stage, where it should be greater than 10. But this is discouraged now by Lee et al. (2022)

use

`linearHypothesis()`

to see only instrument coefficients.

**First-Stage F-Test**

In the context of a two-stage least squares (2SLS) setup where you are estimating the equation:

\[ Y = X \beta + \epsilon \]

and \(X\) is endogenous, you typically estimate a first-stage regression of:

\[ X = Z \pi + u \]

where 𝑍Z is the instrument.

The first-stage F-test evaluates the joint significance of the instruments in this first stage:

\[ F = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/ (n - k - 1)} \]

where:

\(SSR_r\) is the sum of squared residuals from the restricted model (no instruments, just the constant).

\(SSR_{ur}\) is the sum of squared residuals from the unrestricted model (with instruments).

\(q\) is the number of instruments excluded from the main equation.

\(n\) is the number of observations.

\(k\) is the number of explanatory variables excluding the instruments.

**Cragg-Donald Test**

The Cragg-Donald statistic is essentially the same as the Wald statistic of the joint significance of the instruments in the first stage, and it’s used specifically when you have multiple endogenous regressors. It’s calculated as:

\[ CD = n \times (R_{ur}^2 - R_r^2) \]

where:

\(R_{ur}^2\) and \(R_r^2\) are the R-squared values from the unrestricted and restricted models respectively.

\(n\) is the number of observations.

For one endogenous variable, the Cragg-Donald test results should align closely with those from Stock and Yogo. The Anderson canonical correlation test, a likelihood ratio test, also works under similar conditions, contrasting with Cragg-Donald’s Wald statistic approach. Both are valid with one endogenous variable and at least one instrument.

**Stock-Yogo Weak IV Test**

The Stock-Yogo test does not directly compute a statistic like the F-test or Cragg-Donald, but rather uses pre-computed critical values to assess the strength of instruments. It often uses the eigenvalues derived from the concentration matrix:

\[ S = \frac{1}{n} (Z' X) (X'Z) \]

where \(Z\) is the matrix of instruments and \(X\) is the matrix of endogenous regressors.

Stock and Yogo provide critical values for different scenarios (bias, size distortion) for a given number of instruments and endogenous regressors, based on the smallest eigenvalue of \(S\). The test compares these eigenvalues against critical values that correspond to thresholds of permissible bias or size distortion in a 2SLS estimator.

**Critical Values and Test Conditions**: The critical values derived by Stock and Yogo depend on the level of acceptable bias, the number of endogenous regressors, and the number of instruments. For example, with a 5% maximum acceptable bias, one endogenous variable, and three instruments, the critical value for a sufficient first stage F-statistic is 13.91. Note that this framework requires at least two overidentifying degree of freedom.

**Comparison**

Test |
Description |
Focus |
Usage |
---|---|---|---|

First-Stage F-Test |
Evaluates the joint significance of instruments in the first stage. | Predictive power of instruments for the endogenous variable. | Simplest and most direct test, widely used especially with a single endogenous variable. Rule of thumb: F < 10 suggests weak instruments. |

Cragg-Donald Test |
Wald statistic for joint significance of instruments. | Joint strength of multiple instruments with multiple endogenous variables. | More appropriate in complex IV setups with multiple endogenous variables. Compares statistic against critical values for assessing instrument strength. |

Stock-Yogo Weak IV Test |
Compares test statistic to pre-determined critical values. | Minimizing size distortions and bias from weak instruments. | Theoretical evaluation of instrument strength, ensuring the reliability of 2SLS estimates against specific thresholds of bias or size distortion. |

All the mentioned tests (Stock Yogo, Cragg-Donald, Anderson canonical correlation test) assume errors are independently and identically distributed. If this assumption is violated, the Kleinbergen-Paap test is robust against violations of the iid assumption and can be applied even with a single endogenous variable and instrument, provided the model is properly identified (Baum and Schaffer 2021).

#### 30.4.1.1 Cragg-Donald

Similar to the first-stage F-statistic

```
library(cragg)
library(AER) # for dataaset
data("WeakInstrument")
cragg_donald(
# control variables
X = ~ 1,
# endogeneous variables
D = ~ x,
# instrument variables
Z = ~ z,
data = WeakInstrument
)
#> Cragg-Donald test for weak instruments:
#>
#> Data: WeakInstrument
#> Controls: ~1
#> Treatments: ~x
#> Instruments: ~z
#>
#> Cragg-Donald Statistic: 4.566136
#> Df: 198
```

Large CD statistic implies that the instruments are strong, but not in our case here. But to judge it against some critical value, we have to look at Stock-Yogo

#### 30.4.1.2 Stock-Yogo

J. H. Stock and Yogo (2002) set the critical values such that the bias is less then 10% (default)

\(H_0:\) Instruments are weak

\(H_1:\) Instruments are not weak

```
library(cragg)
library(AER) # for dataaset
data("WeakInstrument")
stock_yogo_test(
# control variables
X = ~ 1,
# endogeneous variables
D = ~ x,
# instrument variables
Z = ~ z,
size_bias = "bias",
data = WeakInstrument
)
```

The CD statistic should be bigger than the set critical value to be considered strong instruments.

### 30.4.2 Exogeneity Assumption

The local average treatment effect (LATE) is defined as:

\[ \text{LATE} = \frac{\text{reduced form}}{\text{first stage}} = \frac{\rho}{\phi} \]

This implies that the reduced form (\(\rho\)) is the product of the first stage (\(\phi\)) and LATE:

\[ \rho = \phi \times \text{LATE} \]

Thus, if the first stage (\(\phi\)) is 0, the reduced form (\(\rho\)) should also be 0.

```
# Load necessary libraries
library(shiny)
library(AER) # for ivreg
library(ggplot2) # for visualization
library(dplyr) # for data manipulation
# Function to simulate the dataset
simulate_iv_data <- function(n, beta, phi, direct_effect) {
Z <- rnorm(n)
epsilon_x <- rnorm(n)
epsilon_y <- rnorm(n)
X <- phi * Z + epsilon_x
Y <- beta * X + direct_effect * Z + epsilon_y
data <- data.frame(Y = Y, X = X, Z = Z)
return(data)
}
# Function to run the simulations and calculate the effects
run_simulation <- function(n, beta, phi, direct_effect) {
# Simulate the data
simulated_data <- simulate_iv_data(n, beta, phi, direct_effect)
# Estimate first-stage effect (phi)
first_stage <- lm(X ~ Z, data = simulated_data)
phi <- coef(first_stage)["Z"]
phi_ci <- confint(first_stage)["Z", ]
# Estimate reduced-form effect (rho)
reduced_form <- lm(Y ~ Z, data = simulated_data)
rho <- coef(reduced_form)["Z"]
rho_ci <- confint(reduced_form)["Z", ]
# Estimate LATE using IV regression
iv_model <- ivreg(Y ~ X | Z, data = simulated_data)
iv_late <- coef(iv_model)["X"]
iv_late_ci <- confint(iv_model)["X", ]
# Calculate LATE as the ratio of reduced-form and first-stage coefficients
calculated_late <- rho / phi
calculated_late_se <- sqrt(
(rho_ci[2] - rho)^2 / phi^2 + (rho * (phi_ci[2] - phi) / phi^2)^2
)
calculated_late_ci <- c(calculated_late - 1.96 * calculated_late_se,
calculated_late + 1.96 * calculated_late_se)
# Return a list of results
list(phi = phi,
phi_ci = phi_ci,
rho = rho,
rho_ci = rho_ci,
direct_effect = direct_effect,
direct_effect_ci = c(direct_effect, direct_effect), # Placeholder for direct effect CI
iv_late = iv_late,
iv_late_ci = iv_late_ci,
calculated_late = calculated_late,
calculated_late_ci = calculated_late_ci,
true_effect = beta,
true_effect_ci = c(beta, beta)) # Placeholder for true effect CI
}
# Define UI for the sliders
ui <- fluidPage(
titlePanel("IV Model Simulation"),
sidebarLayout(
sidebarPanel(
sliderInput("beta", "True Effect of X on Y (beta):", min = 0, max = 1.0, value = 0.5, step = 0.1),
sliderInput("phi", "First Stage Effect (phi):", min = 0, max = 1.0, value = 0.7, step = 0.1),
sliderInput("direct_effect", "Direct Effect of Z on Y:", min = -0.5, max = 0.5, value = 0, step = 0.1)
),
mainPanel(
plotOutput("dotPlot")
)
)
)
# Define server logic to run the simulation and generate the plot
server <- function(input, output) {
output$dotPlot <- renderPlot({
# Run simulation
results <- run_simulation(n = 1000, beta = input$beta, phi = input$phi, direct_effect = input$direct_effect)
# Prepare data for plotting
plot_data <- data.frame(
Effect = c("First Stage (phi)", "Reduced Form (rho)", "Direct Effect", "LATE (Ratio)", "LATE (IV)", "True Effect"),
Value = c(results$phi, results$rho, results$direct_effect, results$calculated_late, results$iv_late, results$true_effect),
CI_Lower = c(results$phi_ci[1], results$rho_ci[1], results$direct_effect_ci[1], results$calculated_late_ci[1], results$iv_late_ci[1], results$true_effect_ci[1]),
CI_Upper = c(results$phi_ci[2], results$rho_ci[2], results$direct_effect_ci[2], results$calculated_late_ci[2], results$iv_late_ci[2], results$true_effect_ci[2])
)
# Create dot plot with confidence intervals
ggplot(plot_data, aes(x = Effect, y = Value)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper), width = 0.2) +
labs(title = "IV Model Effects",
y = "Coefficient Value") +
coord_cartesian(ylim = c(-1, 1)) + # Limits the y-axis to -1 to 1 but allows CI beyond
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
})
}
# Run the application
shinyApp(ui = ui, server = server)
```

A statistically significant reduced form estimate without a corresponding first stage indicates an issue, suggesting an alternative channel linking instruments to outcomes or a direct effect of the IV on the outcome.

**No Direct Effect**: When the direct effect is 0 and the first stage is 0, the reduced form is 0.- Note: Extremely rare cases with multiple additional paths that perfectly cancel each other out can also produce this result, but testing for all possible paths is impractical.

**With Direct Effect**: When there is a direct effect of the IV on the outcome, the reduced form can be significantly different from 0, even if the first stage is 0.- This violates the exogeneity assumption, as the IV should only affect the outcome through the treatment variable.

To test the validity of the exogeneity assumption, we can use a sanity test:

- Identify groups for which the effects of instruments on the treatment variable are small and not significantly different from 0. The reduced form estimate for these groups should also be 0. These “no-first-stage samples” provide evidence of whether the exogeneity assumption is violated.

#### 30.4.2.1 Overid Tests

Wald test and Hausman test for exogeneity of \(X\) assuming \(Z\) is exogenous

- People might prefer Wald test over Hausman test.

Sargan (for 2SLS) is a simpler version of Hansen’s J test (for IV-GMM)

Modified J test (i.e., Regularized jacknife IV): can handle weak instruments and small sample size (Carrasco and Doukali 2022) (also proposed a regularized F-test to test relevance assumption that is robust to heteroskedasticity).

New advances: endogeneity robust inference in finite sample and sensitivity analysis of inference (Kiviet 2020)

These tests that can provide evidence fo the validity of the over-identifying restrictions is not sufficient or necessary for the validity of the moment conditions (i.e., this assumption cannot be tested). (Deaton 2010; Parente and Silva 2012)

The over-identifying restriction can still be valid even when the instruments are correlated with the error terms, but then in this case, what you’re estimating is no longer your parameters of interest.

Rejection of the over-identifying restrictions can also be the result of

**parameter heterogeneity**(J. D. Angrist, Graddy, and Imbens 2000)

Why overid tests hold no value/info?

Overidentifying restrictions are valid irrespective of the instruments’ validity

- Whenever instruments have the same motivation and are on the same scale, the estimated parameter of interests will be very close (Parente and Silva 2012, 316)

Overidentifying restriction are invalid when each instrument is valid

- When the effect of your parameter of interest is heterogeneous (e.g., you have two groups with two different true effects), your first instrument can be correlated with your variable of interest only for the first group and your second interments can be correlated with your variable of interest only for the second group (i.e., each instrument is valid), and if you use each instrument, you can still identify the parameter of interest. However, if you use both of them, what you estimate is a mixture of the two groups. Hence, the overidentifying restriction will be invalid (because no single parameters can make the errors of the model orthogonal to both instruments). The result may seem confusing at first because if each subset of overidentifying restrictions is valid, the full set should also be valid. However, this interpretation is flawed because the residual’s orthogonality to the instruments depends on the chosen set of instruments, and therefore the set of restrictions tested when using two sets of instruments together is not the same as the union of the sets of restrictions tested when using each set of instruments separately (Parente and Silva 2012, 316)

These tests (of overidentifying restrictions) should be used to check whether different instruments identify the same parameters of interest, not to check their validity

(J. A. Hausman 1983; Parente and Silva 2012)

##### 30.4.2.1.1 Wald Test

Assuming that \(Z\) is exogenous (a valid instrument), we want to know whether \(X_2\) is exogenous

1st stage:

\[ X_2 = \hat{\alpha} Z + \hat{\epsilon} \]

2nd stage:

\[ Y = \delta_0 X_1 + \delta_1 X_2 + \delta_2 \hat{\epsilon} + u \]

where

- \(\hat{\epsilon}\) is the residuals from the 1st stage

The Wald test of exogeneity assumes

\[ H_0: \delta_2 = 0 \\ H_1: \delta_2 \neq 0 \]

If you have more than one endogenous variable with more than one instrument, \(\delta_2\) is a vector of all residuals from all the first-stage equations. And the null hypothesis is that they are jointly equal 0.

If you reject this hypothesis, it means that \(X_2\) is **not endogenous**. Hence, for this test, we do not want to reject the null hypothesis.

If the test is not sacrificially significant, we might just don’t have enough information to reject the null.

When you have a valid instrument \(Z\), whether \(X_2\) is endogenous or exogenous, your coefficient estimates of \(X_2\) should still be consistent. But if \(X_2\) is exogenous, then 2SLS will be inefficient (i.e., larger standard errors).

Intuition:

\(\hat{\epsilon}\) is the supposed endogenous part of \(X_2\), When we regress \(Y\) on \(\hat{\epsilon}\) and observe that its coefficient is not different from 0. It means that the exogenous part of \(X_2\) can explain well the impact on \(Y\), and there is no endogenous part.

##### 30.4.2.1.2 Hausman’s Test

Similar to Wald Test and identical to Wald Test when we have homoskedasticity (i.e., homogeneity of variances). Because of this assumption, it’s used less often than Wald Test

##### 30.4.2.1.3 Hansen’s J

J-test (over-identifying restrictions test): test whether

**additional**instruments are exogenous- Can only be applied in cases where you have more instruments than endogenous variables
- \(dim(Z) > dim(X_2)\)

- Assume at least one instrument within \(Z\) is exogenous

- Can only be applied in cases where you have more instruments than endogenous variables

Procedure IV-GMM:

- Obtain the residuals of the 2SLS estimation
- Regress the residuals on all instruments and exogenous variables.
- Test the joint hypothesis that all coefficients of the residuals across instruments are 0 (i.e., this is true when instruments are exogenous).
Compute \(J = mF\) where \(m\) is the number of instruments, and \(F\) is your equation \(F\) statistic (can you use

`linearHypothesis()`

again).If your exogeneity assumption is true, then \(J \sim \chi^2_{m-k}\) where \(k\) is the number of endogenous variables.

- If you reject this hypothesis, it can be that
The first sets of instruments are invalid

The second sets of instruments are invalid

Both sets of instruments are invalid

**Note**: This test is only true when your residuals are homoskedastic.

For a heteroskedasticity-robust \(J\)-statistic, see (Carrasco and Doukali 2022; H. Li et al. 2022)

##### 30.4.2.1.4 Sargan Test

Similar to Hansen’s J, but it assumes homoskedasticity

Have to be careful when sample is not collected exogenously. As such, when you have choice-based sampling design, the sampling weights have to be considered to have consistent estimates. However, even if we apply sampling weights, the tests are not suitable because the iid assumption off errors are already violated. Hence, the test is invalid in this case (Pitt 2011).

If one has heteroskedasticity in its design, the Sargan test is invalid (Pitt 2011})

### References

*Journal of Econometrics*142 (1): 183–200.

*Journal of Econometrics*146 (2): 241–54.

*The Review of Economic Studies*67 (3): 499–527.

*The Econometrics Journal*25 (1): 71–97.

*Econometric Theory*9 (2): 222–40.

*Journal of Economic Literature*48 (2): 424–55.

*Econometrica: Journal of the Econometric Society*, 1029–54.

*Handbook of Econometrics*1: 391–448.

*Journal of Econometrics*218 (2): 294–316.

*American Economic Review*112 (10): 3260–90.

*Entropy*24 (8): 1076.

*Econometrica*71 (4): 1027–48.

*Economics Letters*115 (2): 314–17.

*Access at: Http://Www. Brown. Edu/Research/Projects/Pitt*.

*Econometrica: Journal of the Econometric Society*, 393–415.