## 16.1 Endogenous Treatment

Using the OLS estimates as a reference point

`library(AER)`

`## Loading required package: car`

`## Loading required package: carData`

`## Loading required package: lmtest`

`## Loading required package: zoo`

```
##
## Attaching package: 'zoo'
```

```
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
```

`## Loading required package: sandwich`

`## Loading required package: survival`

`library(REndo)`

```
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
```

```
set.seed(421)
data("CASchools")
<- CASchools
school $stratio <- with(CASchools, students / teachers)
school<-
m1.ols lm(read ~ stratio + english + lunch + grades + income + calworks + county,
data = school)
summary(m1.ols)$coefficients[1:7, ]
```

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 683.45305948 9.56214469 71.4748711 3.011667e-218
## stratio -0.30035544 0.25797023 -1.1643027 2.450536e-01
## english -0.20550107 0.03765408 -5.4576041 8.871666e-08
## lunch -0.38684059 0.03700982 -10.4523759 1.427370e-22
## gradesKK-08 -1.91291321 1.35865394 -1.4079474 1.599886e-01
## income 0.71615378 0.09832843 7.2832829 1.986712e-12
## calworks -0.05273312 0.06154758 -0.8567863 3.921191e-01
```

### 16.1.1 Instrumental Variable

A3a requires \(\epsilon_i\) to be uncorrelated with \(\mathbf{x}_i\)

\[ plim(\hat{\beta}_{OLS}) = \beta + [E(\mathbf{x_i'x_i})]^{-1}E(\mathbf{x_i'}\epsilon_i) \]

A3a is the weakest assumption needed for OLS to be **consistent**

[A3] fails when \(x_{ik}\) is correlated with \(\epsilon_i\)

- [Omitted Variables Bias] \(\epsilon_i\) includes any other factors that may influence the dependent variable (linearly)
- [Feedback Effect (Simultaneity)] Demand and prices are simultaneously determined.
- [Endogenous sample design (sample selection)] we did not have iid sample
- [Measurement Error]

**Note**

- Omitted Variable: an omitted variable is a variable, omitted from the model (but is in the \(\epsilon_i\)) and unobserved has predictive power towards the outcome.

- Omitted Variable Bias: is the bias (and inconsistency when looking at large sample properties) of the OLS estimator when the omitted variable.

The **structural equation** is used to emphasize that we are interested understanding a **causal relationship**

\[ y_{i1} = \beta_0 + \mathbf{z}_i1 \beta_1 + y_{i2}\beta_2 + \epsilon_i \]

where

- \(y_{it}\) is the outcome variable (inherently correlated with \(\epsilon_i\))
- \(y_{i2}\) is the endogenous covariate (presumed to be correlated with \(\epsilon_i\))
- \(\beta_1\) represents the causal effect of \(y_{i2}\) on \(y_{i1}\)
- \(\mathbf{z}_{i1}\) is exogenous controls (uncorrelated with \(\epsilon_i\)) (\(E(z_{1i}'\epsilon_i) = 0\))

OLS is an inconsistent estimator of the causal effect \(\beta_2\)

If there was no endogeneity

- \(E(y_{i2}'\epsilon_i) = 0\)
- the exogenous variation in \(y_{i2}\) is what identifies the causal effect

If there is endogeneity

- Any wiggle in \(y_{i2}\) will shift simultaneously with \(\epsilon_i\)

\[ plim(\hat{\beta}_{OLS}) = \beta + [E(\mathbf{x'_ix_i})]^{-1}E(\mathbf{x'_i}\epsilon_i) \]

where

- \(\beta\) is the causal effect
- \([E(\mathbf{x'_ix_i})]^{-1}E(\mathbf{x'_i}\epsilon_i)\) is the endogenous effect

Hence \(\hat{\beta}_{OLS}\) can be either more positive and negative than the true causal effect.

Motivation for **Two Stage Least Squares (2SLS)**

\[ y_{i1}=\beta_0 + \mathbf{z}_{i1}\beta_1 + y_{i2}\beta_2 + \epsilon_i \]

We want to understand how movement in \(y_{i2}\) effects movement in \(y_{i1}\), but whenever we move \(y_{i2}\), \(\epsilon_i\) also moves.

**Solution**

We need a way to move \(y_{i2}\) independently of \(\epsilon_i\), then we can analyze the response in \(y_{i1}\) as a causal effect

Find an

**instrumental variable(s)**\(z_{i2}\)

- Instrument Relevance
**: when**\(z_{i2}\) moves then \(y_{i2}\) also moves

- Instrument Exogeneity**: when \(z_{i2}\) moves then \(\epsilon_i\) does not move.

- Instrument Relevance
\(z_{i2}\) is the

**exogenous variation that identifies**the causal effect \(\beta_2\)

Finding an Instrumental variable:

- Random Assignment: + Effect of class size on educational outcomes: instrument is initial random
- Relation’s Choice + Effect of Education on Fertility: instrument is parent’s educational level
- Eligibility + Trade-off between IRA and 401K retirement savings: instrument is 401k eligibility

**Example**

Return to College

education is correlated with ability - endogenous

**Near 4year**as an instrument

- Instrument Relevance: when
**near**moves then education also moves

- Instrument Exogeneity: when
**near**moves then \(\epsilon_i\) does not move.

- Instrument Relevance: when
Other potential instruments; near a 2-year college. Parent’s Education. Owning Library Card

\[ y_{i1}=\beta_0 + \mathbf{z}_{i1}\beta_1 + y_{i2}\beta_2 + \epsilon_i \]

First Stage (Reduced Form) Equation:

\[ y_{i2} = \pi_0 + \mathbf{z_{i1}\pi_1} + \mathbf{z_{i2}\pi_2} + v_i \]

where

- \(\pi_0 + \mathbf{z_{i1}\pi_1} + \mathbf{z_{i2}\pi_2}\) is exogenous variation \(v_i\) is endogenous variation

This is called a **reduced form equation**

* Not interested in the causal interpretation of \(\pi_1\) or \(\pi_2\) * A linear projection of \(z_{i1}\) and \(z_{i2}\) on \(y_{i2}\) (simple correlations) * The projections \(\pi_1\) and \(\pi_2\) guarantee that \(E(z_{i1}'v_i)=0\) and \(E(z_{i2}'v_i)=0\)

Instrumental variable \(z_{i2}\)

**Instrument Relevance**: \(\pi_2 \neq 0\)**Instrument Exogeneity**: \(E(\mathbf{z_{i2}\epsilon_i})=0\)

Moving only the exogenous part of \(y_i2\) is moving

\[ \tilde{y}_{i2} = \pi_0 + \mathbf{z_{i1}\pi_1 + z_{i2}\pi_2} \]

**two Stage Least Squares (2SLS)**

\[ y_{i1} = \beta_0 +\mathbf{z_{i1}\beta_1}+ y_{i2}\beta_2 + \epsilon_i \]

\[ y_{i2} = \pi_0 + \mathbf{z_{i2}\pi_2} + \mathbf{v_i} \]

Equivalently,

\[\begin{equation} \begin{split} y_{i1} = \beta_0 + \mathbf{z_{i1}}\beta_1 + \tilde{y}_{i2}\beta_2 + u_i \end{split} \tag{16.1} \end{equation}\]

where

- \(\tilde{y}_{i2} =\pi_0 + \mathbf{z_{i2}\pi_2}\)
- \(u_i = v_i \beta_2+ \epsilon_i\)

- A2 holds if the instrument is relevant \(\pi_2 \neq 0\) + \(y_{i1} = \beta_0 + \mathbf{z_{i1}\beta_1 + (\pi_0 + z_{i1}\pi_1 + z_{i2}\pi_2)}\beta_2 + u_i\)
- A3a holds if the instrument is exogenous \(E(\mathbf{z}_{i2}\epsilon_i)=0\)

\[ \begin{aligned} E(\tilde{y}_{i2}'u_i) &= E((\pi_0 + \mathbf{z_{i1}\pi_1+z_{i2}})(v_i\beta_2 + \epsilon_i)) \\ &= E((\pi_0 + \mathbf{z_{i1}\pi_1+z_{i2}})( \epsilon_i)) \\ &= E(\epsilon_i)\pi_0 + E(\epsilon_iz_{i1})\pi_1 + E(\epsilon_iz_{i2}) \\ &=0 \end{aligned} \]

Hence, (16.1) is consistent

The 2SLS Estimator

1. Estimate the first stage using OLS

\[ y_{i2} = \pi_0 + \mathbf{z_{i2}\pi_2} + \mathbf{v_i} \]

and obtained estimated value \(\hat{y}_{i2}\)

- Estimate the altered equation using OLS

\[ y_{i1} = \beta_0 +\mathbf{z_{i1}\beta_1}+ \hat{y}_{i2}\beta_2 + \epsilon_i \\ \]

**Properties of the 2SLS Estimator**

- Under A1, A2, A3a (for \(z_{i1}\)), A5 and if the instrument satisfies the following two conditions, +
**Instrument Relevance**: \(\pi_2 \neq 0\) +**Instrument Exogeneity**: \(E(\mathbf{z}_{i2}'\epsilon_i) = 0\) then the 2SLS estimator is consistent - Can handle more than one endogenous variable and more than one instrumental variable

\[ y_{i1} = \beta_0 + z_{i1}\beta_1 + y_{i2}\beta_2 + y_{i3}\beta_3 + \epsilon_i \\ y_{i2} = \pi_0 + z_{i1}\pi_1 + z_{i2}\pi_2 + z_{i3}\pi_3 + z_{i4}\pi_4 + v_{i2} \\ y_{i3} = \gamma_0 + z_{i1}\gamma_1 + z_{i2}\gamma_2 + z_{i3}\gamma_3 + z_{i4}\gamma_4 + v_{i3} \]

```
+ **IV estimator**: one endogenous variable with a single instrument
+ **2SLS estimator**: one endogenous variable with multiple instruments
+ **GMM estimator**: multiple endogenous variables with multiple instruments
```

Standard errors produced in the second step are not correct

- Because we do not know \(\tilde{y}\) perfectly and need to estimate it in the firs step, we are introducing additional variation

- We did not have this problem with FGLS because “the first stage was orthogonal to the second stage.” This is generally not true for most multi-step procedure.

- If A4 does not hold, need to report robust standard errors.

- Because we do not know \(\tilde{y}\) perfectly and need to estimate it in the firs step, we are introducing additional variation
2SLS is less efficient than OLS and will always have larger standard errors.

- First, \(Var(u_i) = Var(v_i\beta_2 + \epsilon_i) > Var(\epsilon_i)\)

- Second, \(\hat{y}_{i2}\) is generally highly collinear with \(\mathbf{z}_{i1}\)

- First, \(Var(u_i) = Var(v_i\beta_2 + \epsilon_i) > Var(\epsilon_i)\)
The number of instruments need to be at least as many or more the number of endogenous variables.

**Note**

- 2SLS can be combined with FGLS to make the estimator more efficient: You have the same first-stage, and in the second-stage, instead of using OLS, you can use FLGS with the weight matrix \(\hat{w}\)

- Generalized Method of Moments can be more efficient than 2SLS.

- In the second-stage of 2SLS, you can also use MLE, but then you are making assumption on the distribution of the outcome variable, the endogenous variable, and their relationship (joint distribution).

#### 16.1.1.1 Testing Assumption

Test of Endogeneity: Is \(y_{i2}\) truly endogenous (i.e., can we just use OLS instead of 2SLS)?

Testing Instrument’s assumptions

- Exogeneity: Cannot always test (and when you can it might not be informative)

- Exogeneity: Cannot always test (and when you can it might not be informative)

##### 16.1.1.1.1 Test of Endogeneity

2SLS is generally so inefficient that we may prefer OLS if there is not much endogeneity

Biased but inefficient vs efficient but biased

Want a sense of “how endogenous” \(y_{i2}\) is

- if “very” endgeneous - should use 2SLS

- if not “very” endogenous - perhaps prefer OLS

- if “very” endgeneous - should use 2SLS

**Invalid** Test of Endogeneity * \(y_{i2}\) is endogenous if it is correlated with \(\epsilon_i\),

\[ \epsilon_i = \gamma_0 + y_{i2}\gamma_1 + error_i \]

where \(\gamma_1 \neq 0\) implies that there is endogeneity

- \(\epsilon_i\) is not observed, but using the residuals

\[ e_i = \gamma_0 + y_{i2}\gamma_1 + error_i \]

is **NOT** a valid test of endogeneity + The OLS residual, e is mechanically uncorrelated with \(y_{i2}\) (by FOC for OLS) + In every situation, \(\gamma_1\) will be essentially 0 and you will never be able to reject the null of no endogeneity

**Valid** test of endogeneity

- If \(y_{i2}\) is not endogenous then \(\epsilon_i\) and v are uncorrelated

\[ y_{i1} = \beta_0 + \mathbf{z}_{i1}\beta_1 + y_{i2}\beta_2 + \epsilon_i \\ y_{i2} = \pi_0 + \mathbf{z}_{i1}\pi_1 + z_{i2}\pi_2 + v_i \]

**variable Addition test**: include the first stage residuals as an additional variable,

\[ y_{i1} = \beta_0 + \mathbf{z}_{i1}\beta_1 + y_{i2}\beta_2 + \hat{v}_i \theta + error_i \]

Then the usual t-test of significance is a valid test to evaluate the following hypothesis. **note** this test requires your instrument to be valid instrument.

\[ \begin{aligned} H_0: \theta = 0 && \text{ (not endogenous)} \\ H_1: \theta \neq 0 && \text{ (endogenous)} \end{aligned} \]

##### 16.1.1.1.2 Testing Instrument’s assumptions

The instrumental variable must satisfy

###### 16.1.1.1.2.1 Exogeneity

Why exogeneity matter?

\[ E(\mathbf{z}_{i2}'\epsilon_i) = 0 \]

- If A3a fails - 2SLS is also inconsistent
- If instrument is not exogenous, then we need to find a new one.
- Similar to Test of Endogeneity, when there is a single instrument

\[ e_i = \gamma_0 + \mathbf{z}_{i2}\gamma_1 + error_i \\ H_0: \gamma_1 = 0 \]

is **NOT** a valid test of endogeneity

* the OLS residual, e is mechanically uncorrelated with \(z_{i2}\): \(\hat{\gamma}_1\) will be essentially 0 and you will never be able to determine if the instrument is endogenous.

**Solution**

Testing Instrumental Exegeneity in an Over-identified Model * When there is more than one exogenous instrument (per endogenous variable), we can test for instrument exogeneity.

+ When we have multiple instruments, the model is said to be over-identiifed.

+ Could estimate the same model several ways (i.e., can identify/ estimate \(\beta_1\) more than one way)

* Idea behind the test: if the controls and instruments are truly exogenous then OLS estimation of the following regression,

\[ \epsilon_i = \gamma_0 + \mathbf{z}_{i1}\gamma_1 + \mathbf{z}_{i2}\gamma_2 + error_i \]

should have a very low \(R^2\)

* if the model is **just identified** (one instrument per endogenous variable) then the \(R^2 = 0\)

Steps:

Estimate the structural equation by 2SLS (using all available instruments) and obtain the residuals e

Regress e on all controls and instruments and obtain the \(R^2\)

Under the null hypothesis (all IV’s are uncorrelated), \(nR^2 \sim \chi^2(q)\), where q is the number of instrumental variables minus the number of endogenous variables

- if the model is just identified (one instrument per endogenous variable) then q = 0, and the distribution under the null collapses.

low p-value means you reject the null of exogenous instruments. Hence you would like to have high p-value in this test.

**Pitfalls for the Overid test**

the overid test is essentially compiling the following information.

- Conditional on first instrument being exogenous is the other instrument exogenous?

- Conditional on the other instrument being exogenous, is the first instrument exogenous?

- Conditional on first instrument being exogenous is the other instrument exogenous?
If all instruments are endogenous than neither test will be valid

really only useful if one instrument is thought to be truly exogenous (randomly assigned). even f you do reject the null, the test does not tell you which instrument is exogenous and which is endogenous.

Result | Implication |
---|---|

reject the null | you can be pretty sure there is an endogenous instrument, but don’t know which one. |

fail to reject | could be either (1) they are both exogenous, (2) they are both endogenous. |

###### 16.1.1.1.2.2 Relevancy

Why Relevance matter?

\[ \pi_2 \neq 0 \]

* used to show A2 holds + If \(\pi_2 = 0\) (instrument is not relevant) then A2 fails - perfect multicollinearity

+ If \(\pi_2\) is close to 0 (**weak instrument**) then there is near perfect multicollinearity - 2SLS is highly inefficient (Large standard errors).

* A weak instrument will exacerbate any inconsistency due to an instrument being (even slightly) endogenous.

+ In the simple case with no controls and a single endogenous variable and single instrumental variable,

\[ plim(\hat{\beta}_{2_{2SLS}}) = \beta_2 + \frac{E(z_{i2}\epsilon_i)}{E(z_{i2}y_{i2})} \]

**Testing Weak Instruments**

can use t-test (or F-test for over-identified models) in the first stage to determine if there is a weak instrument problem.

(Stock and Yogo 2005): a statistical rejection of the null hypothesis in the first stage at the 5% (or even 1%) level is not enough to insure the instrument is not weak

- Rule of Thumb: need a F-stat of at least 10 (or a t-stat of at least 3.2) to reject the null hypothesis that the instrument is weak.

**Summary of the 2SLS Estimator**

\[ y_{i1}=\beta_0 + \mathbf{z}_{i1}\beta_1 + y_{i2}\beta_2 + \epsilon_i \\ y_{i2} = \pi_0 + \mathbf{z_{i1}\pi_1} + \mathbf{z_{i2}\pi_2} + v_i \]

- when A3a does not hold

\[ E(y_{i2}'\epsilon_i) \neq 0 \]

Then the OLS estimator is no longer unbiased or consistent.

* If we have valid instruments \(\mathbf{z}_{i2}\)Relevancy: \(\pi_2 \neq 0\) Then the 2SLS estimator is consistent under A1, A2, A5a, and the above two conditions. + If A4 also holds, then the usual standard errors are valid. + If A4 does not hold then use the robust standard errors.

\[ y_{i1}=\beta_0 + \mathbf{z}_{i1}\beta_1 + y_{i2}\beta_2 + \epsilon_i \\ y_{i2} = \pi_0 + \mathbf{z_{i1}\pi_1} + \mathbf{z_{i2}\pi_2} + v_i \]

* When A3a does hold

\[ E(y_{i2}'\epsilon_i) = 0 \]

and we have valid instruments, then both the OLS and 2SLS estimators are consistent.

+ The OLS estimator is always more efficient + can use the variable addition test to determine if 2SLS is need (A3a does hold) or if OLS is valid (A3a does not hold)

Sometimes we can test the assumption for instrument to be valid:

+ Exogeneity: Only table when there are more instruments than endogenous variables. + Relevancy: Always testable, need the F-stat to be greater than 10 to rule out a weak instrument

Application

Expenditure as observed instrument

```
.2sls <-
m2ivreg(
~ stratio + english + lunch + grades + income + calworks +
read | expenditure + english + lunch + grades + income + calworks +
county
county ,data = school
)summary(m2.2sls)$coefficients[1:7, ]
```

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 700.47891593 13.58064436 51.5792106 8.950497e-171
## stratio -1.13674002 0.53533638 -2.1234126 3.438427e-02
## english -0.21396934 0.03847833 -5.5607753 5.162571e-08
## lunch -0.39384225 0.03773637 -10.4366757 1.621794e-22
## gradesKK-08 -1.89227865 1.37791820 -1.3732881 1.704966e-01
## income 0.62487986 0.11199008 5.5797785 4.668490e-08
## calworks -0.04950501 0.06244410 -0.7927892 4.284101e-01
```

### 16.1.2 Internal instrumental variable

(also **instrument free methods**). This section is based on Raluca Gui’s guide

alternative to external instrumental variable approaches

All approaches here assume a **continuous dependent variable**

**Application**

#### 16.1.2.1 Non-hierarchical Data (Cross-classified)

\[ Y_t = \beta_0 + \beta_1 P_t + \beta_2 X_t + \epsilon_t \]

where

- \(t = 1, .., T\) (indexes either time or cross-sectional units)

- \(Y_t\) is a k x 1 response variable

- \(X_t\) is a k x n exogenous regressor

- \(P_t\) is a k x 1 continuous endogenous regressor

- \(\epsilon_t\) is a structural error term with \(\mu_\epsilon =0\) and \(E(\epsilon^2) = \sigma^2\)
- \(\beta\) are model parameters

The endogeneity problem arises from the correlation of \(P_t\) and \(\epsilon_t\):

\[ P_t = \gamma Z_t + v_t \]

where

- \(Z_t\) is a l x 1 vector of internal instrumental variables

- \(ν_t\) is a random error with \(\mu_{v_t}, E(v^2) = \sigma^2_v, E(\epsilon v) = \sigma_{\epsilon v}\)

- \(Z_t\) is assumed to be stochastic with distribution G

- \(ν_t\) is assumed to have density h(·)

##### 16.1.2.1.1 Latent Instrumental Variable

assume \(Z_t\) (unobserved) to be uncorrelated with \(\epsilon_t\), which is similar to Instrumental Variable. Hence, \(Z_t\) and \(ν_t\) can’t be identified without distributional assumptions

The distributions of \(Z_t\) and \(ν_t\) need to be specified such that:

- endogeneity of \(P_t\) is corrected

- the distribution of \(P_t\) is empirically close to the integral that expresses the amount of overlap of Z as it is shifted over ν (= the convolution between \(Z_t\) and \(ν_t\)).

When the density h(·) = Normal, then G cannot be normal because the parameters would not be identified (Ebbes et al. 2005) .

Hence,

- in the LIV model the distribution of \(Z_t\) is discrete

- in the Higher Moments Method and Joint Estimation Using Copula methods, the distribution of \(Z_t\) is taken to be skewed.

\(Z_t\) are assumed **unobserved, discrete and exogenous**, with

- an unknown number of groups m

- \(\gamma\) is a vector of group means.

Identification of the parameters relies on the distributional assumptions of

- \(P_t\): a non-Gaussian distribution

- \(Z_t\) discrete with \(m \ge 2\)

Note:

- If \(Z_t\) is continuous, the model is unidentified

- If \(P_t \sim N\), you have inefficient estimates.

`<- latentIV(read ~ stratio, data=school) m3.liv `

`## No start parameters were given. The linear model read ~ stratio is fitted to derive them.`

`## The start parameters c((Intercept)=706.449, stratio=-2.621, pi1=19.64, pi2=21.532, theta5=0.5, theta6=1, theta7=0.5, theta8=1) are used for optimization.`

`summary(m3.liv)$coefficients[1:7,]`

```
## Estimate Std. Error z-score Pr(>|z|)
## (Intercept) 6.996014e+02 2.686186e+02 2.604441e+00 9.529597e-03
## stratio -2.272673e+00 1.367757e+01 -1.661605e-01 8.681108e-01
## pi1 -4.896363e+01 5.526907e-08 -8.859139e+08 0.000000e+00
## pi2 1.963920e+01 9.225351e-02 2.128830e+02 0.000000e+00
## theta5 6.939432e-152 3.354672e-160 2.068587e+08 0.000000e+00
## theta6 3.787512e+02 4.249457e+01 8.912932e+00 1.541524e-17
## theta7 -1.227543e+00 4.885276e+01 -2.512741e-02 9.799653e-01
```

it will return a coefficient very different from the other methods since there is only one endogenous variable.

##### 16.1.2.1.2 Joint Estimation Using Copula

assume \(Z_t\) (unobserved) to be uncorrelated with \(\epsilon_t\), which is similar to Instrumental Variable. Hence, \(Z_t\) and \(ν_t\) can’t be identified without distributional assumptions

(Park and Gupta 2012) allows joint estimation of the continuous \(P_t\) and \(\epsilon_t\) using Gaussian copulas, where a copula is a function that maps several conditional distribution functions (CDF) into their joint CDF).

The underlying idea is that using information contained in the observed data, one selects marginal distributions for \(P_t\) and \(\epsilon_t\). Then, the copula model constructs a flexible multivariate joint distribution that allows a wide range of correlations between the two marginals.

The method allows both continuous and discrete \(P_t\).

In the special case of **one continuous** \(P_t\), estimation is based on MLE

Otherwise, based on Gaussian copulas, augmented OLS estimation is used.

**Assumptions**:

skewed \(P_t\)

the recovery of the correct parameter estimates

\(\epsilon_t \sim\) normal marginal distribution. The marginal distribution of \(P_t\) is obtained using the

**Epanechnikov kernel density estimator**

\[ \hat{h}_p = \frac{1}{T . b} \sum_{t=1}^TK(\frac{p - P_t}{b}) \] where\(P_t\) = endogenous variables

\(K(x) = 0.75(1-x^2)I(||x||\le 1)\)

\(b=0.9T^{-1/5}\times min(s, IQR/1.34)\) suggested by (Silverman 1969)

- IQR = interquartile range

- s = sample standard deviation

- T = n of time periods observed in the data

- IQR = interquartile range

In augmented OLS and MLE, the inference procedure occurs in two stages:

(1): the empirical distribution of \(P_t\) is computed

(2) used in it constructing the likelihood function)

Hence, the standard errors would not be correct.

So we use the sampling distributions (from bootstrapping) to get standard errors and the variance-covariance matrix. Since the distribution of the bootstraped parameters is highly skewed, we report the percentile confidence intervals is preferable.

```
set.seed(110)
<-
m4.cc copulaCorrection(
~ stratio + english + lunch + calworks +
read + income + county | continuous(stratio),
grades data = school,
optimx.args = list(method = c("Nelder-Mead"), itnmax = 60000),
num.boots = 2,
verbose = FALSE
)
```

`## Warning: It is recommended to run 1000 or more bootstraps.`

`summary(m4.cc)$coefficients[1:7, ]`

```
## Point Estimate Boots SE Lower Boots CI (95%) Upper Boots CI (95%)
## (Intercept) 683.06900891 2.80554212 NA NA
## stratio -0.32434608 0.02075999 NA NA
## english -0.21576110 0.01450666 NA NA
## lunch -0.37087664 0.01902052 NA NA
## calworks -0.05569058 0.02076781 NA NA
## gradesKK-08 -1.92286128 0.25684614 NA NA
## income 0.73595353 0.04725700 NA NA
```

we run this model with only one endogenous continuous regressor (`stratio`

). Sometimes, the code will not converge, in which case you can use different

- optimization algorithm

- starting values

- maximum number of iterations

##### 16.1.2.1.3 Higher Moments Method

suggested by (Lewbel 1997) to identify \(\epsilon_t\) caused by **measurement error**.

Identification is achieved by using third moments of the data, with no restrictions on the distribution of \(\epsilon_t\)

The following instruments can be used with 2SLS estimation to obtain consistent estimates:

\[ \begin{aligned} q_{1t} &= (G_t - \bar{G}) \\ q_{2t} &= (G_t - \bar{G})(P_t - \bar{P}) \\ q_{3t} &= (G_t - \bar{G})(Y_t - \bar{Y})\\ q_{4t} &= (Y_t - \bar{Y})(P_t - \bar{P}) \\ q_{5t} &= (P_t - \bar{P})^2 \\ q_{6t} &= (Y_t - \bar{Y})^2 \\ \end{aligned} \]

where

- \(G_t = G(X_t)\) for any given function G that has finite third own and cross moments

- X = exogenous variable

\(q_{5t}, q_{6t}\) can be used only when the measurement and \(\epsilon_t\) are symmetrically distributed. The rest of the instruments does not require any distributional assumptions for \(\epsilon_t\).

Since the regressors \(G(X) = X\) are included as instruments, \(G(X)\) can’t be a linear function of X in \(q_{1t}\)

Since this method has very strong assumptions, Higher Moments Method should only be used in case of overidentification

```
set.seed(111)
<-
m5.hetEr hetErrorsIV(
~ stratio + english + lunch + calworks + income +
read + county | stratio | IIV(income, english),
grades data = school
)
```

`## Residuals were derived by fitting stratio ~ english + lunch + calworks + income + grades + county.`

```
## Warning: A studentized Breusch-Pagan test (stratio ~ english) indicates at a 95%
## confidence level that the assumption of heteroscedasticity for the variable is
## not satisfied (p-value: 0.2428). The instrument built from it therefore is weak.
```

`## The following internal instruments were built: IIV(income), IIV(english).`

`## Fitting an instrumental variable regression with model read ~ stratio + english + lunch + calworks + income + grades + |english + lunch + calworks + income + grades + county + IIV(income) + IIV(english) county|english + lunch + calworks + income + grades + county + IIV(income) + IIV(english).`

`summary(m5.hetEr)$coefficients[1:7, ]`

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 662.78791557 27.90173069 23.7543657 2.380436e-76
## stratio 0.71480686 1.31077325 0.5453322 5.858545e-01
## english -0.19522271 0.04057527 -4.8113717 2.188618e-06
## lunch -0.37834232 0.03927793 -9.6324402 9.760809e-20
## calworks -0.05665126 0.06302095 -0.8989273 3.692776e-01
## income 0.82693755 0.17236557 4.7975797 2.335271e-06
## gradesKK-08 -1.93795843 1.38723186 -1.3969968 1.632541e-01
```

recommend using this approach to create additional instruments to use with external ones for better efficiency.

##### 16.1.2.1.4 Heteroskedastic Error Approach

- using means of variables that are uncorrelated with the product of heteroskedastic errors to identify structural parameters.

- This method can be use either when you don’t have external instruments or you want to use additional instruments to improve the efficiency of the IV estimator (Lewbel 2012)

- The instruments are constructed as simple functions of data

- Model’s assumptions:

\[ E(X \epsilon) = 0 \\ E(X v ) = 0 \\ cov(Z, \epsilon v) = 0 \\ cov(Z, v^2) \neq 0 \text{ (for identification)} \]

Structural parameters are identified by 2SLS regression of Y on X and P, using X and [Z − E(Z)]ν as instruments.

\[ \text{instrument's strength} \propto cov((Z-\bar{Z})v,v) \]

where \(cov((Z-\bar{Z})v,v)\) is the degree of heteroskedasticity of ν with respect to Z (Lewbel 2012), which can be empirically tested.

If it is zero or close to zero (i.e.,the instrument is weak), you might have imprecise estimates, with large standard errors.

- Under homoskedasticity, the parameters of the model are unidentified.

- Under heteroskedasticity related to at least some elements of X, the parameters of the model are identified.

#### 16.1.2.2 Hierarchical Data

Multiple independent assumptions involving various random components at different levels mean that any moderate correlation between some predictors and a random component or error term can result in a significant bias of the coefficients and of the variance components. (Kim and Frees 2007) proposed a generalized method of moments which uses both, the between and within variations of the exogenous variables, but only assumes the within variation of the variables to be endogenous.

**Assumptions**

- the errors at each level \(\sim iid N\)

- the slope variables are exogenous

- the level-1 \(\epsilon \perp X, P\). If this is not the case, additional, external instruments are necessary

**Hierarchical Model**

\[ Y_{cst} = Z_{cst}^1 \beta_{cs}^1 + X_{cst}^1 \beta_1 + \epsilon_{cst}^1 \\ \beta^1_{cs} = Z_{cs}^2 \beta_{c}^2 + X_{cst}^2 \beta_2 + \epsilon_{cst}^2 \\ \beta^2_{c} = X^3_c \beta_3 + \epsilon_c^3 \]

Bias could stem from:

- errors at the higher two levels (\(\epsilon_c^3,\epsilon_{cst}^2\)) are correlated with some of the regressors

- only third level errors (\(\epsilon_c^3\)) are correlated with some of the regressors

(Kim and Frees 2007) proposed

- When all variables are assumed exogenous, the proposed estimator equals the random effects estimator

- When all variables are assumed endogenous, it equals the fixed effects estimator

- also use omitted variable test (based on the Hausman-test (Hausman 1978) for panel data), which allows the comparison of a robust estimator and an estimator that is efficient under the null hypothesis of no omitted variables or the comparison of two robust estimators at different levels.

```
set.seed(113)
$gr08 <- school$grades == "KK-06"
school<-
m7.multilevel multilevelIV(read ~ stratio + english + lunch + income + gr08 +
+ (1 | county) | endo(stratio),
calworks data = school)
```

`## Fitting linear mixed-effects model read ~ stratio + english + lunch + income + gr08 + calworks + (1 | county).`

`## Detected multilevel model with 2 levels.`

`## For county (Level 2), 45 groups were found.`

`summary(m7.multilevel)$coefficients[1:7, ]`

```
## Estimate Std. Error z-score Pr(>|z|)
## (Intercept) 675.8228656 5.58008680 121.1133248 0.000000e+00
## stratio -0.4956054 0.23922638 -2.0717005 3.829339e-02
## english -0.2599777 0.03413530 -7.6160948 2.614656e-14
## lunch -0.3692954 0.03560210 -10.3728537 3.295342e-25
## income 0.6723141 0.08862012 7.5864728 3.287314e-14
## gr08TRUE 2.1590333 1.28167222 1.6845440 9.207658e-02
## calworks -0.0570633 0.05711701 -0.9990596 3.177658e-01
```

Another example using simulated data

- level-1 regressors: \(X_{11}, X_{12}, X_{13}, X_{14}, X_{15}\), where \(X_{15}\) is correlated with the level-2 error (i.e., endogenous).

- level-2 regressors: \(X_{21}, X_{22}, X_{23}, X_{24}\)

- level-3 regressors: \(X_{31}, X_{32}, X_{33}\)

We estimate a three-level model with X15 assumed endogenous. Having a three-level hierarchy, `multilevelIV()`

returns five estimators, from the most robust to omitted variables (FE_L2), to the most efficient (REF) (i.e. lowest mean squared error).

- The random effects estimator (REF) is efficient assuming no omitted variables
- The fixed effects estimator (FE) is unbiased and asymptotically normal even in the presence of omitted variables.

- Because of the efficiency, the random effects estimator is preferable if you think there is no omitted. variables

- The robust estimator would be preferable if you think there is omitted variables.

```
data(dataMultilevelIV)
set.seed(114)
<-
formula1 ~ X11 + X12 + X13 + X14 + X15 + X21 + X22 + X23 + X24 +
y + X32 + X33 + (1 | CID) + (1 | SID) | endo(X15)
X31 <-
m8.multilevel multilevelIV(formula = formula1, data = dataMultilevelIV)
```

`## Fitting linear mixed-effects model y ~ X11 + X12 + X13 + X14 + X15 + X21 + X22 + X23 + X24 + X31 + X32 + X33 + (1 | CID) + (1 | SID).`

`## Detected multilevel model with 3 levels.`

`## For CID (Level 2), 1368 groups were found.`

`## For SID (Level 3), 40 groups were found.`

`coef(m8.multilevel)`

```
## REF FE_L2 FE_L3 GMM_L2 GMM_L3
## (Intercept) 64.3168856 0.0000000 0.0000000 64.3485944 64.3168868
## X11 3.0213405 3.0459605 3.0214255 3.0146686 3.0213403
## X12 8.9522160 8.9839088 8.9524723 8.9747533 8.9522169
## X13 -2.0194178 -2.0145054 -2.0193321 -2.0021426 -2.0194171
## X14 1.9651420 1.9791437 1.9648317 1.9658681 1.9651421
## X15 -0.5647915 -0.9777361 -0.5647621 -0.9750309 -0.5648070
## X21 -2.3316225 0.0000000 -2.2845297 -2.3052516 -2.3316215
## X22 -3.9564944 0.0000000 -3.9553644 -4.0130975 -3.9564966
## X23 -2.9779887 0.0000000 -2.9756848 -2.9488487 -2.9779876
## X24 4.9078293 0.0000000 4.9084694 4.7933756 4.9078250
## X31 2.1142348 0.0000000 0.0000000 2.1164477 2.1142349
## X32 0.3934770 0.0000000 0.0000000 0.3799626 0.3934764
## X33 0.1082086 0.0000000 0.0000000 0.1108386 0.1082087
```

`summary(m8.multilevel, "REF")`

```
##
## Call:
## multilevelIV(formula = formula1, data = dataMultilevelIV)
##
## Number of levels: 3
## Number of observations: 2824
## Number of groups: L2(CID): 1368 L3(SID): 40
##
## Coefficients for model REF:
## Estimate Std. Error z-score Pr(>|z|)
## (Intercept) 64.31689 7.87332 8.169 3.11e-16 ***
## X11 3.02134 0.02576 117.306 < 2e-16 ***
## X12 8.95222 0.02572 348.131 < 2e-16 ***
## X13 -2.01942 0.02409 -83.835 < 2e-16 ***
## X14 1.96514 0.02521 77.937 < 2e-16 ***
## X15 -0.56479 0.01950 -28.962 < 2e-16 ***
## X21 -2.33162 0.16228 -14.368 < 2e-16 ***
## X22 -3.95649 0.13119 -30.160 < 2e-16 ***
## X23 -2.97799 0.06611 -45.044 < 2e-16 ***
## X24 4.90783 0.19796 24.792 < 2e-16 ***
## X31 2.11423 0.10433 20.264 < 2e-16 ***
## X32 0.39348 0.30426 1.293 0.1959
## X33 0.10821 0.05236 2.067 0.0388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Omitted variable tests for model REF:
## df Chisq p-value
## GMM_L2_vs_REF 7 18.74 0.009040 **
## GMM_L3_vs_REF 13 -12872.98 1.000000
## FE_L2_vs_REF 13 39.99 0.000139 ***
## FE_L3_vs_REF 13 39.99 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

True \(\beta_{X_{15}} =-1\). We can see that some estimators are bias because \(X_{15}\) is correlated with the level-two error, to which only FE_L2 and GMM_L2 are robust

To select the appropriate estimator, we use the omitted variable test.

In a three-level setting, we can have different estimator comparisons:

- Fixed effects vs. random effects estimators: Test for omitted level-two and level-three omitted effects, simultaneously, one compares FE_L2 to REF. But we will not know at which omitted variables exist.

- Fixed effects vs. GMM estimators: Once the existence of omitted effects is established but not sure at which level, we test for level-2 omitted effects by comparing FE_L2 vs GMM_L3. If you reject the null, the omitted variables are at level-2 The same is accomplished by testing FE_L2 vs. GMM_L2, since the latter is consistent only if there are no omitted effects at level-2.

- Fixed effects vs. fixed effects estimators: We can test for omitted level-2 effects, while allowing for omitted level-3 effects by comparing FE_L2 vs. FE_L3 since FE_L2 is robust against both level-2 and level-3 omitted effects while FE_L3 is only robust to level-3 omitted variables.

Summary, use the omitted variable test comparing `REF vs. FE_L2`

first.

If the null hypothesis is rejected, then there are omitted variables either at level-2 or level-3

Next, test whether there are level-2 omitted effects, since testing for omitted level three effects relies on the assumption there are no level-two omitted effects. You can use any of these pair of comparisons:

`FE_L2 vs. FE_L3`

`FE_L2 vs. GMM_L2`

If no omitted variables at level-2 are found, test for omitted level-3 effects by comparing either

- FE_L3 vs. GMM_L3

- GMM_L2 vs. GMM_L3

- FE_L3 vs. GMM_L3

`summary(m8.multilevel, "REF")`

```
##
## Call:
## multilevelIV(formula = formula1, data = dataMultilevelIV)
##
## Number of levels: 3
## Number of observations: 2824
## Number of groups: L2(CID): 1368 L3(SID): 40
##
## Coefficients for model REF:
## Estimate Std. Error z-score Pr(>|z|)
## (Intercept) 64.31689 7.87332 8.169 3.11e-16 ***
## X11 3.02134 0.02576 117.306 < 2e-16 ***
## X12 8.95222 0.02572 348.131 < 2e-16 ***
## X13 -2.01942 0.02409 -83.835 < 2e-16 ***
## X14 1.96514 0.02521 77.937 < 2e-16 ***
## X15 -0.56479 0.01950 -28.962 < 2e-16 ***
## X21 -2.33162 0.16228 -14.368 < 2e-16 ***
## X22 -3.95649 0.13119 -30.160 < 2e-16 ***
## X23 -2.97799 0.06611 -45.044 < 2e-16 ***
## X24 4.90783 0.19796 24.792 < 2e-16 ***
## X31 2.11423 0.10433 20.264 < 2e-16 ***
## X32 0.39348 0.30426 1.293 0.1959
## X33 0.10821 0.05236 2.067 0.0388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Omitted variable tests for model REF:
## df Chisq p-value
## GMM_L2_vs_REF 7 18.74 0.009040 **
## GMM_L3_vs_REF 13 -12872.98 1.000000
## FE_L2_vs_REF 13 39.99 0.000139 ***
## FE_L3_vs_REF 13 39.99 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`# compare REF with all the other estimators. Testing REF (the most efficient estimator) against FE_L2 (the most robust estimator), equivalently we are testing simultaneously for level-2 and level-3 omitted effects. `

Since the null hypothesis is rejected (p = 0.000139), there is bias in the random effects estimator.

To test for level-2 omitted effects (regardless of level-3 omitted effects), we compare FE_L2 versus FE_L3

`summary(m8.multilevel,"FE_L2")`

```
##
## Call:
## multilevelIV(formula = formula1, data = dataMultilevelIV)
##
## Number of levels: 3
## Number of observations: 2824
## Number of groups: L2(CID): 1368 L3(SID): 40
##
## Coefficients for model FE_L2:
## Estimate Std. Error z-score Pr(>|z|)
## (Intercept) 0.000e+00 4.275e-19 0.00 1
## X11 3.046e+00 2.978e-02 102.30 <2e-16 ***
## X12 8.984e+00 3.360e-02 267.41 <2e-16 ***
## X13 -2.015e+00 3.107e-02 -64.83 <2e-16 ***
## X14 1.979e+00 3.203e-02 61.80 <2e-16 ***
## X15 -9.777e-01 3.364e-02 -29.06 <2e-16 ***
## X21 0.000e+00 1.824e-18 0.00 1
## X22 0.000e+00 1.303e-18 0.00 1
## X23 0.000e+00 4.389e-18 0.00 1
## X24 0.000e+00 1.724e-18 0.00 1
## X31 0.000e+00 1.468e-17 0.00 1
## X32 0.000e+00 8.265e-18 0.00 1
## X33 0.000e+00 2.793e-17 0.00 1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Omitted variable tests for model FE_L2:
## df Chisq p-value
## FE_L2_vs_REF 13 39.99 0.000139 ***
## FE_L2_vs_FE_L3 9 36.02 3.92e-05 ***
## FE_L2_vs_GMM_L2 12 39.99 7.21e-05 ***
## FE_L2_vs_GMM_L3 13 39.99 0.000139 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The null hypothesis of no omitted level-2 effects is rejected (\(p = 3.92e − 05\)). Hence, there are omitted effects at level-two. We should use FE_L2 which is consistent with the underlying data that we generated (level-2 error correlated with \(X_15\), which leads to biased FE_L3 coefficients.

The omitted variable test between FE_L2 and GMM_L2 should reject the null hypothesis of no omitted level-2 effects (p-value is 0).

If we assume an endogenous variable as exogenous, the RE and GMM estimators will be biased because of the wrong set of internal instrumental variables. To increase our confidence, we should compare the omitted variable tests when the variable is considered endogenous vs. exogenous to get a sense whether the variable is truly endogenous.

### 16.1.3 Proxy Variables

Can be in place of the omitted variable,

* will not be able to estimate the effect of the omitted variable * will be able to reduce some endogeneity caused bye the omitted variable

Criteria for a proxy variable:

- The proxy is correlated with the omitted variable.
- Having the omitted variable in the regression will solve the problem of endogeneity 3.The variation of the omitted variable unexplained by the proxy is uncorrelated with all independent variables, including the proxy.

IQ test can be a proxy for ability in the regression between wage explained education.

For the third requirement

\[ ability = \gamma_0 + \gamma_1 IQ + \epsilon \]

where \(\epsilon\) is uncorrelated with education and IQ test.