36.1 Endogenous Treatment
Endogenous treatment occurs when the variable of interest (the “treatment”) is not randomly assigned and is correlated with unobserved determinants of the outcome. As discussed earlier, this can arise from omitted variables, simultaneity, or reverse causality. But even if the true variable is theoretically exogenous, measurement error can make it endogenous in practice.
This section focuses on how measurement errors, especially in explanatory variables, introduce bias—typically attenuation bias—and why they are a central concern in applied research.
36.1.1 Measurement Errors
Measurement error refers to the difference between the true value of a variable and its observed (measured) value.
- Sources of measurement error:
- Coding errors: Manual or software-induced data entry mistakes.
- Reporting errors: Self-report bias, recall issues, or strategic misreporting.
Two Broad Types of Measurement Error
- Random (Stochastic) Error — Classical Measurement Error
- Noise is unpredictable and averages out in expectation.
- Error is uncorrelated with the true variable and the regression error.
- Common in survey data, tracking errors.
- Systematic (Non-classical) Error — Non-Random Bias
- Measurement error exhibits consistent patterns across observations.
- Often arises from:
- Instrument error: e.g., faulty sensors, uncalibrated scales.
- Method error: poor sampling, survey design flaws.
- Human error: judgment errors, social desirability bias.
Key insight:
- Random error adds noise, pushing estimates toward zero.
- Systematic error introduces bias, pushing estimates either upward or downward.
36.1.1.1 Classical Measurement Error
36.1.1.1.1 Right-Hand Side Variable
Let’s examine the most common and analytically tractable case: classical measurement error in an explanatory variable.
Suppose the true model is:
\[ Y_i = \beta_0 + \beta_1 X_i + u_i \]
But we do not observe \(X_i\) directly. Instead, we observe:
\[ \tilde{X}_i = X_i + e_i \]
where \(e_i\) is the measurement error, assumed classical:
- \(E[e_i] = 0\)
- \(Cov(X_i, e_i) = 0\)
- \(Cov(e_i, u_i) = 0\)
Now, substitute \(\tilde{X}_i\) into the regression:
\[ \begin{aligned} Y_i &= \beta_0 + \beta_1 ( \tilde{X}_i - e_i ) + u_i \\ &= \beta_0 + \beta_1 \tilde{X}_i + (u_i - \beta_1 e_i) \\ &= \beta_0 + \beta_1 \tilde{X}_i + v_i \end{aligned} \]
where \(v_i = u_i - \beta_1 e_i\) is a composite error term.
Since \(\tilde{X}_i\) contains \(e_i\), and \(v_i\) contains \(e_i\), we now have:
\[ Cov(\tilde{X}_i, v_i) \neq 0 \]
This correlation violates the exogeneity assumption and introduces endogeneity.
We can derive the asymptotic bias:
\[ \begin{aligned} E[\tilde{X}_i v_i] &= E[(X_i + e_i)(u_i - \beta_1 e_i)] \\ &= -\beta_1 Var(e_i) \\ &\neq 0 \end{aligned} \]
This implies:
- If \(\beta_1 > 0\), then \(\hat{\beta}_1\) is biased downward.
- If \(\beta_1 < 0\), then \(\hat{\beta}_1\) is biased upward.
This is called attenuation bias: the estimated effect is biased toward zero.
As the variance of the error \(Var(e_i)\) increases or \(\frac{Var(e_i)}{Var(\tilde{X}_i)} \to 1\), this bias becomes more severe.
Attenuation Factor
The OLS estimator based on the noisy regressor is
\[ \hat{\beta}_{OLS} = \frac{ \text{cov}(\tilde{X}, Y)}{\text{var}(\tilde{X})} = \frac{\text{cov}(X + e, \beta X + u)}{\text{var}(X + e)}. \]
Using the assumptions of classical measurement error, it follows that:
\[ plim\ \hat{\beta}_{OLS} = \beta \cdot \frac{\sigma_X^2}{\sigma_X^2 + \sigma_e^2} = \beta \cdot \lambda, \]
where:
- \(\sigma_X^2\) is the variance of the true regressor \(X\),
- \(\sigma_e^2\) is the variance of the measurement error \(e\), and
- \(\lambda = \frac{\sigma_X^2}{\sigma_X^2 + \sigma_e^2}\) is called the reliability ratio, signal-to-total variance ratio, or attenuation factor.
Since \(\lambda \in (0, 1]\), the bias always attenuates the estimate toward zero. The degree of attenuation bias is:
\[ \hat{\beta}_{OLS} - \beta = - (1 - \lambda)\beta, \]
which implies:
- If \(\lambda = 1\), then \(\hat{\beta}_{OLS} = \beta\) — no bias (no measurement error).
- If \(\lambda < 1\), then \(\hat{\beta}_{OLS} < \beta\) — attenuation toward zero.
Important Notes on Measurement Error
Data transformations can magnify measurement error.
Suppose the true model is nonlinear:
\[ y = \beta x + \gamma x^2 + \epsilon, \]
and \(x\) is measured with classical error. Then, the attenuation factor for \(\hat{\gamma}\) is approximately the square of the attenuation factor for \(\hat{\beta}\):
\[ \lambda_{\hat{\gamma}} \approx \lambda_{\hat{\beta}}^2. \]
This shows how nonlinear transformations (e.g., squares, logs) can exacerbate measurement error problems.
Including covariates can increase attenuation bias.
Adding covariates that are correlated with the mismeasured variable can worsen bias in the coefficient of interest, especially if the measurement error is not accounted for in those covariates.
Remedies for Measurement Error
To address attenuation bias caused by classical measurement error, consider the following strategies:
- Use validation data or survey information to estimate \(\sigma_X^2\), \(\sigma_e^2\), or \(\lambda\) and apply correction methods (e.g., SIMEX, regression calibration).
- Instrumental Variables Approach
Use an instrument \(Z\) that:- Is correlated with the true variable \(X\),
- Is uncorrelated with the regression error \(\epsilon\), and
- Is uncorrelated with the measurement error \(e\).
- Abandon your project
If no good instruments or validation data exist, and the attenuation bias is too severe, it may be prudent to reconsider the analysis or research question. (Said with love and academic humility.)
36.1.1.1.2 Left-Hand Side Variable
Measurement error in the dependent variable (i.e., the response or outcome) is fundamentally different from measurement error in explanatory variables. Its consequences are often less problematic for consistent estimation of regression coefficients (e.g., the zero conditional mean assumption is not violated), but not necessarily for statistical inference (e.g., higher standard errors) or model fit.
Suppose we are interested in the standard linear regression model:
\[ Y_i = \beta X_i + u_i, \]
but we do not observe \(Y_i\) directly. Instead, we observe:
\[ \tilde{Y}_i = Y_i + v_i, \]
where:
- \(v_i\) is measurement error in the dependent variable,
- \(E[v_i] = 0\) (mean-zero),
- \(v_i\) is uncorrelated with \(X_i\) and \(u_i\),
- \(v_i\) is homoskedastic and independent across observations.
Be extra careful here!
These are classical‐error assumptions:
- Mean zero: \(\mathbb{E}[v\mid X]=0\).
- Exogeneity: \(v\) is uncorrelated with each regressor and with the structural disturbance \(\epsilon\) (i.e., \(\operatorname{Cov}(X,v)=\operatorname{Cov}(\epsilon,v)=0\)).
- Homoskedasticity / finite moments for the law‑of‑large‑numbers to apply.
The regression we actually estimate is:
\[ \tilde{Y}_i = \beta X_i + u_i + v_i. \]
We can define a composite error term:
\[ \tilde{u}_i = u_i + v_i, \]
so that the model becomes:
\[ \tilde{Y}_i = \beta X_i + \tilde{u}_i. \]
Under the classical-error assumptions, the extra noise simply enlarges the composite error term \(\tilde{u}_i\), leaving
\[ \hat\beta^{\text{OLS}} =\beta + ( X' X)^{-1} X'(u+v) \;\xrightarrow{p} \beta , \]
so the estimator remains consistent and only its variance rises.
Key Insights
Unbiasedness and Consistency of \(\hat{\beta}\):
As long as \(E[\tilde{u}_i \mid X_i] = 0\), which holds under the classical assumptions (i.e., \(E[u_i \mid X_i] = 0\) and \(E[v_i \mid X_i] = 0\)), the OLS estimator of \(\beta\) remains unbiased and consistent.
This is because measurement error in the left-hand side does not induce endogeneity. The zero conditional mean assumption is preserved.
Interpretation (Why Econometricians Don’t Panic):
Econometricians and causal researchers often focus on consistent estimation of causal effects under strict exogeneity. Since \(v_i\) just adds noise to the outcome and doesn’t systematically relate to \(X_i\), the slope estimate \(\hat{\beta}\) remains a valid estimate of the causal effect \(\beta\).
Statistical Implications (Why Statisticians Might Worry):
Although \(\hat{\beta}\) is consistent, the variance of the error term increases due to the added noise \(v_i\). Specifically:
\[ \text{Var}(\tilde{u}_i) = \text{Var}(u_i) + \text{Var}(v_i) = \sigma_u^2 + \sigma_v^2. \]
This leads to:
- Higher residual variance \(\Rightarrow\) lower \(R^2\)
- Higher standard errors for coefficient estimates
- Wider confidence intervals, reducing the precision of inference
Thus, even though the point estimate is valid, inference becomes weaker: hypothesis tests are less powerful, and conclusions less precise.
Practical Illustration
- Suppose \(X\) is a marketing investment and \(Y\) is sales revenue.
- If sales are measured with noise (e.g., misrecorded sales data, rounding, reporting delays), the coefficient on marketing is still consistently estimated.
- However, uncertainty around the estimate grows: wider confidence intervals might make it harder to detect statistically significant effects, especially in small samples.
Summary Table: Measurement Error Consequences
Location of Measurement Error | Bias in \(\hat{\beta}\) | Consistency | Affects Inference? | Typical Concern |
---|---|---|---|---|
Regressor (\(X\)) | Yes (attenuation) | No | Yes | Econometric & statistical |
Outcome (\(Y\)) | No | Yes | Yes | Mainly statistical |
36.1.1.2 Non-Classical Measurement Error
In the classical measurement error model, we assume that the measurement error \(\epsilon\) is independent of the true variable \(X\) and of the regression disturbance \(u\). However, in many realistic data scenarios, this assumption does not hold. Non-classical measurement error refers to cases where:
- \(\epsilon\) is correlated with \(X\),
- or possibly even correlated with \(u\).
Violating the classical assumptions introduces additional and potentially complex biases in OLS estimation.
Recall that in the classical measurement error model, we observe:
\[ \tilde{X} = X + \epsilon, \]
where:
- \(\epsilon\) is independent of \(X\) and \(u\),
- \(E[\epsilon] = 0\).
The true model is:
\[ Y = \beta X + u. \]
Then, OLS based on the mismeasured regressor gives:
\[ \hat{\beta}_{OLS} = \frac{\text{cov}(\tilde{X}, Y)}{\text{var}(\tilde{X})} = \frac{\text{cov}(X + \epsilon, \beta X + u)}{\text{var}(X + \epsilon)}. \]
With classical assumptions, this simplifies to:
\[ plim\ \hat{\beta}_{OLS} = \beta \cdot \frac{\sigma_X^2}{\sigma_X^2 + \sigma_\epsilon^2} = \beta \cdot \lambda, \]
where \(\lambda\) is the reliability ratio, which attenuates \(\hat{\beta}\) toward zero.
Let us now relax the independence assumption and allow for correlation between \(X\) and \(\epsilon\). In particular, suppose:
- \(\text{cov}(X, \epsilon) = \sigma_{X\epsilon} \ne 0\).
Then the probability limit of the OLS estimator becomes:
\[ \begin{aligned} plim\ \hat{\beta} &= \frac{\text{cov}(X + \epsilon, \beta X + u)}{\text{var}(X + \epsilon)} \\ &= \frac{\beta (\sigma_X^2 + \sigma_{X\epsilon})}{\sigma_X^2 + \sigma_\epsilon^2 + 2 \sigma_{X\epsilon}}. \end{aligned} \]
We can rewrite this as:
\[ \begin{aligned} plim\ \hat{\beta} &= \beta \left(1 - \frac{\sigma_\epsilon^2 + \sigma_{X\epsilon}}{\sigma_X^2 + \sigma_\epsilon^2 + 2 \sigma_{X\epsilon}} \right) \\ &= \beta (1 - b_{\epsilon \tilde{X}}), \end{aligned} \]
where \(b_{\epsilon \tilde{X}}\) is the regression coefficient of \(\epsilon\) on \(\tilde{X}\), or more precisely:
\[ b_{\epsilon \tilde{X}} = \frac{\text{cov}(\epsilon, \tilde{X})}{\text{var}(\tilde{X})}. \]
This makes clear that the bias in \(\hat{\beta}\) depends on how strongly the measurement error is correlated with the observed regressor \(\tilde{X}\). This general formulation nests the classical case as a special case:
- In classical error: \(\sigma_{X\epsilon} = 0 \Rightarrow b_{\epsilon \tilde{X}} = \frac{\sigma^2_\epsilon}{\sigma^2_X + \sigma^2_\epsilon} = 1 - \lambda\).
Implications of Non-Classical Measurement Error
- When \(\sigma_{X\epsilon} > 0\), the attenuation bias can increase or decrease depending on the balance of variances.
- In particular:
- If more than half of the variance in \(\tilde{X}\) is due to measurement error, increasing \(\sigma_{X\epsilon}\) increases attenuation.
- If less than half is due to measurement error, it can actually reduce attenuation.
- This phenomenon is sometimes called mean-reverting measurement error: the measurement error pulls observed values toward the mean, distorting estimates Bound, Brown, and Mathiowetz (2001).
36.1.1.2.1 A General Framework for Non-Classical Measurement Error
Bound, Brown, and Mathiowetz (2001) offer a unified matrix framework that accommodates measurement error in both the independent and dependent variables.
Let the true model be:
\[ \mathbf{Y = X \beta + \epsilon}, \]
but we observe \(\tilde{X} = X + U\) and \(\tilde{Y} = Y + v\), where:
- \(U\) is a matrix of measurement error in \(X\),
- \(v\) is a vector of measurement error in \(Y\).
Then, the observed model becomes:
\[ \hat{\beta} = (\tilde{X}' \tilde{X})^{-1} \tilde{X}' \tilde{Y}. \]
Substituting the observed quantities:
\[ \begin{aligned} \tilde{Y} &= Y + v = X \beta + \epsilon + v, \\ &= \tilde{X} \beta - U \beta + v + \epsilon. \end{aligned} \]
Hence,
\[ \hat{\beta} = (\tilde{X}' \tilde{X})^{-1} \tilde{X}' (\tilde{X} \beta - U \beta + v + \epsilon), \]
which simplifies to:
\[ \hat{\beta} = \beta + (\tilde{X}' \tilde{X})^{-1} \tilde{X}' (-U \beta + v + \epsilon). \]
Taking the probability limit:
\[ plim\ \hat{\beta} = \beta + plim\ [(\tilde{X}' \tilde{X})^{-1} \tilde{X}' (-U \beta + v)], \]
Now define:
\[ W = [U \quad v], \]
and we can express the bias compactly as:
\[ plim\ \hat{\beta} = \beta + plim\ [(\tilde{X}' \tilde{X})^{-1} \tilde{X}' W \begin{bmatrix} - \beta \\ 1 \end{bmatrix} ]. \]
This formulation highlights a powerful insight:
Bias in \(\hat{\beta}\) arises from the linear projection of the measurement errors onto the observed \(\tilde{X}\).
This expression does not assert that \(v\) necessarily biases \(\hat\beta\); it simply makes explicit that bias arises whenever the linear projection of \((U\beta-v)\) onto \(\tilde X\) is non‑zero. Three cases illustrate the point:
Case | Key correlation | Consequence for \(\hat\beta\) |
\(U\equiv0,\; \operatorname{Cov}(\tilde X,v)=0\) |
projection term vanishes | Consistent; larger standard errors |
Correlated Y‑error \(U\equiv0,\; \operatorname{Cov}(\tilde X,v)\neq0\) |
projection picks up \(v\) | Biased (attenuation or sign reversal possible) |
Both X‑ and Y‑error, independent \(\operatorname{Cov}(X,U)\neq0,\; \operatorname{Cov}(\tilde X,v)=0\) |
\(U\beta\) projects onto \(\tilde X\) | Biased because of \(U\), not \(v\) |
Hence, your usual “harmless \(Y\)-noise” result is the special case in the first row.
Practical implications
Check assumptions explicitly. If the dataset was generated by self‑reports, simultaneous proxies, or modelled outcomes, it is rarely safe to assume \(\operatorname{Cov}(X,v)=0\).
Correlated errors in \(Y\) can creep in through:
- Common data‑generating mechanisms (e.g., same survey module records both earnings (\(Y\)) and hours worked (\(X\))).
- Prediction‑generated variables where \(v\) inherits correlation with the features used to build \(\tilde Y\).
Joint mis‑measurement (\(U\) and \(v\) correlated) is common in administrative or sensor data; here, even “classical” \(v\) with respect to \(X\) can correlate with \(\tilde X=X+U\).
Measurement error in \(Y\) is benign only under strong exogeneity and independence conditions. The Bound–Brown–Mathiowetz matrix form (Bound, Brown, and Mathiowetz 2001) simply shows that once those conditions fail—or once \(X\) itself is mis‑measured—the same projection logic that produces attenuation bias for \(X\) can also transmit bias from \(v\) to \(\hat\beta\).
So the rule of thumb you learned is true in its narrow, classical setting, but Bound, Brown, and Mathiowetz (2001) remind us that empirical work often strays outside that safe harbor.
Consequences and Correction
- Non-classical error can lead to over- or underestimation, unlike the always-attenuating classical case.
- The direction and magnitude of bias depend on the correlation structure of \(X\), \(\epsilon\), and \(v\).
- This poses serious problems in many survey and administrative data settings where systematic misreporting occurs.
Practical Solutions
Instrumental Variables
Use an instrument \(Z\) that is correlated with the true variable \(X\), but uncorrelated with both measurement error and the regression disturbance. IV can help eliminate both classical and non-classical error-induced biases.Validation Studies
Use a subset of the data with accurate measures to estimate the structure of measurement error and correct estimates via techniques such as regression calibration, multiple imputation, or SIMEX.Modeling the Error Process
Explicitly model the measurement error process, especially in longitudinal or panel data (e.g., via state-space models or Bayesian approaches).Binary/Dummy Variable Case
Non-classical error in binary regressors (e.g., misclassification) also leads to bias, but IV methods still apply. For example, if education level is misreported in survey data, a valid instrument (e.g., policy-based variation) can correct for misclassification bias.
Summary
Feature | Classical Error | Non-Classical Error |
---|---|---|
\(\text{Cov}(X, \epsilon)\) | 0 | \(\ne 0\) |
Bias in \(\hat{\beta}\) | Always attenuation | Can attenuate or inflate |
Consistency of OLS | No | No |
Effect of Variance Structure | Predictable | Depends on \(\sigma_{X\epsilon}\) |
Fixable with IV | Yes | Yes |
In short, non-classical measurement error breaks the comforting regularity of attenuation bias. It can produce arbitrary biases depending on the nature and structure of the error. Instrumental variables and validation studies are often the only reliable tools for addressing this complex problem.
36.1.1.3 Solution to Measurement Errors in Correlation Estimation
36.1.1.3.1 Bayesian Correction for Correlation Coefficient
We begin by expressing the Bayesian posterior for a correlation coefficient \(\rho\):
\[ \begin{aligned} P(\rho \mid \text{data}) &= \frac{P(\text{data} \mid \rho) P(\rho)}{P(\text{data})} \\ \text{Posterior Probability} &\propto \text{Likelihood} \times \text{Prior Probability} \end{aligned} \]
Where:
- \(\rho\) is the true population correlation coefficient
- \(P(\text{data} \mid \rho)\) is the likelihood function
- \(P(\rho)\) is the prior density of \(\rho\)
- \(P(\text{data})\) is the marginal likelihood (a normalizing constant)
With sample correlation coefficient \(r\):
\[ r = \frac{S_{xy}}{\sqrt{S_{xx} S_{yy}}} \]
According to Schisterman et al. (2003), pp. 3, the posterior density of \(\rho\) can be approximated as:
\[ P(\rho \mid x, y) \propto P(\rho) \cdot \frac{(1 - \rho^2)^{(n - 1)/2}}{(1 - \rho r)^{n - 3/2}} \]
This approximation leads to a posterior that can be modeled via the Fisher transformation:
- Let \(\rho = \tanh(\xi)\), where \(\xi \sim N(z, 1/n)\)
- \(r = \tanh(z)\) is the Fisher-transformed correlation
Using conjugate normal approximations, we derive the posterior for the transformed correlation \(\xi\) as:
- Posterior Variance:
\[ \sigma^2_{\text{posterior}} = \frac{1}{n_{\text{prior}} + n_{\text{likelihood}}} \]
- Posterior Mean:
\[ \mu_{\text{posterior}} = \sigma^2_{\text{posterior}} \left(n_{\text{prior}} \cdot \tanh^{-1}(r_{\text{prior}}) + n_{\text{likelihood}} \cdot \tanh^{-1}(r_{\text{likelihood}})\right) \]
To simplify the mathematics, we may assume a prior of the form:
\[ P(\rho) \propto (1 - \rho^2)^c \]
where \(c\) controls the strength of the prior. If no prior information is available, we can set \(c = 0\) so that \(P(\rho) \propto 1\).
Example: Combining Estimates from Two Studies
Let:
- Current study: \(r_{\text{likelihood}} = 0.5\), \(n_{\text{likelihood}} = 200\)
- Prior study: \(r_{\text{prior}} = 0.2765\), \(n_{\text{prior}} = 50205\)
Step 1: Posterior Variance
\[ \sigma^2_{\text{posterior}} = \frac{1}{50205 + 200} = 0.0000198393 \]
Step 2: Posterior Mean
Apply Fisher transformation:
- \(\tanh^{-1}(0.2765) \approx 0.2841\)
- \(\tanh^{-1}(0.5) = 0.5493\)
Then:
\[ \begin{aligned} \mu_{\text{posterior}} &= 0.0000198393 \times (50205 \times 0.2841 + 200 \times 0.5493) \\ &= 0.0000198393 \times (14260.7 + 109.86) \\ &= 0.0000198393 \times 14370.56 = 0.2850 \end{aligned} \]
Thus, the posterior distribution of \(\xi = \tanh^{-1}(\rho)\) is:
\[ \xi \sim N(0.2850, 0.0000198393) \]
Transforming back:
- Posterior mean correlation: \(\rho = \tanh(0.2850) = 0.2776\)
- 95% CI for \(\xi\): \(0.2850 \pm 1.96 \cdot \sqrt{0.0000198393} = (0.2762, 0.2937)\)
- Transforming endpoints: \(\tanh(0.2762) = 0.2694\), \(\tanh(0.2937) = 0.2855\)
The Bayesian posterior distribution for the correlation coefficient is:
- Mean: \(\hat{\rho}_{\text{posterior}} = 0.2776\)
- 95% CI: \((0.2694,\ 0.2855)\)
This Bayesian adjustment is especially useful when:
- There is high sampling variation due to small sample sizes
- Measurement error attenuates the observed correlation
- Combining evidence from multiple studies (meta-analytic context)
By leveraging prior information and applying the Fisher transformation, researchers can obtain a more stable and accurate estimate of the true underlying correlation.
# Define inputs
n_new <- 200
r_new <- 0.5
alpha <- 0.05
# Bayesian update function for correlation coefficient
update_correlation <- function(n_new, r_new, alpha) {
# Prior (meta-analysis study)
n_meta <- 50205
r_meta <- 0.2765
# Step 1: Posterior variance (in Fisher-z space)
var_xi <- 1 / (n_new + n_meta)
# Step 2: Posterior mean (in Fisher-z space)
mu_xi <- var_xi * (n_meta * atanh(r_meta) + n_new * atanh(r_new))
# Step 3: Confidence interval in Fisher-z space
z_crit <- qnorm(1 - alpha / 2)
upper_xi <- mu_xi + z_crit * sqrt(var_xi)
lower_xi <- mu_xi - z_crit * sqrt(var_xi)
# Step 4: Transform back to correlation scale
mean_rho <- tanh(mu_xi)
upper_rho <- tanh(upper_xi)
lower_rho <- tanh(lower_xi)
# Return all values as a list
list(
mu_xi = mu_xi,
var_xi = var_xi,
upper_xi = upper_xi,
lower_xi = lower_xi,
mean_rho = mean_rho,
upper_rho = upper_rho,
lower_rho = lower_rho
)
}
# Run update
updated <-
update_correlation(n_new = n_new,
r_new = r_new,
alpha = alpha)
# Display updated posterior mean and confidence interval
cat("Posterior mean of rho:", round(updated$mean_rho, 4), "\n")
#> Posterior mean of rho: 0.2775
cat(
"95% CI for rho: (",
round(updated$lower_rho, 4),
",",
round(updated$upper_rho, 4),
")\n"
)
#> 95% CI for rho: ( 0.2694 , 0.2855 )
# For comparison: Classical (frequentist) confidence interval around r_new
se_r <- sqrt(1 / n_new)
z_r <- qnorm(1 - alpha / 2) * se_r
ci_lo <- r_new - z_r
ci_hi <- r_new + z_r
cat("Frequentist 95% CI for r:",
round(ci_lo, 4),
"to",
round(ci_hi, 4),
"\n")
#> Frequentist 95% CI for r: 0.3614 to 0.6386
36.1.2 Simultaneity
Simultaneity arises when at least one of the explanatory variables in a regression model is jointly determined with the dependent variable, violating a critical assumption for causal inference: temporal precedence.
Why Simultaneity Matters
- In classical regression, we assume that regressors are determined exogenously—they are not influenced by the dependent variable.
- Simultaneity introduces endogeneity, where regressors are correlated with the error term, rendering OLS estimators biased and inconsistent.
- This has major implications in fields like economics, marketing, finance, and social sciences, where feedback mechanisms or equilibrium processes are common.
Real-World Examples
- Demand and supply: Price and quantity are determined together in market equilibrium.
- Sales and advertising: Advertising influences sales, but firms also adjust advertising based on current or anticipated sales.
- Productivity and investment: Higher productivity may attract investment, but investment can improve productivity.
36.1.2.1 Simultaneous Equation System
We begin with a basic two-equation structural model:
\[ \begin{aligned} Y_i &= \beta_0 + \beta_1 X_i + u_i \quad \text{(Structural equation for } Y) \\ X_i &= \alpha_0 + \alpha_1 Y_i + v_i \quad \text{(Structural equation for } X) \end{aligned} \]
Here:
- \(Y_i\) and \(X_i\) are endogenous variables — both determined within the system.
- \(u_i\) and \(v_i\) are structural error terms, assumed to be uncorrelated with the exogenous variables (if any).
The equations form a simultaneous system because each endogenous variable appears on the right-hand side of the other’s equation.
To uncover the statistical properties of these equations, we solve for \(Y_i\) and \(X_i\) as functions of the error terms only:
\[ \begin{aligned} Y_i &= \frac{\beta_0 + \beta_1 \alpha_0}{1 - \alpha_1 \beta_1} + \frac{\beta_1 v_i + u_i}{1 - \alpha_1 \beta_1} \\ X_i &= \frac{\alpha_0 + \alpha_1 \beta_0}{1 - \alpha_1 \beta_1} + \frac{v_i + \alpha_1 u_i}{1 - \alpha_1 \beta_1} \end{aligned} \]
These are the reduced-form equations, expressing the endogenous variables as functions of exogenous factors and disturbances.
36.1.2.2 Simultaneity Bias in OLS
If we naïvely estimate the first equation using OLS, assuming \(X_i\) is exogenous, we get:
\[ \text{Bias: } \quad Cov(X_i, u_i) = Cov\left(\frac{v_i + \alpha_1 u_i}{1 - \alpha_1 \beta_1}, u_i\right) = \frac{\alpha_1}{1 - \alpha_1 \beta_1} \cdot Var(u_i) \]
This violates the Gauss-Markov Theorem that regressors be uncorrelated with the error term. The OLS estimator for \(\beta_1\) is biased and inconsistent.
To allow for identification and estimation, we introduce exogenous variables:
\[ \begin{cases} Y_i = \beta_0 + \beta_1 X_i + \beta_2 T_i + u_i \\ X_i = \alpha_0 + \alpha_1 Y_i + \alpha_2 Z_i + v_i \end{cases} \]
Where:
- \(X_i\), \(Y_i\) — endogenous variables
- \(T_i\), \(Z_i\) — exogenous variables, not influenced by any variable in the system
Solving this system algebraically yields the reduced form model:
\[ \begin{cases}\begin{aligned}Y_i &= \frac{\beta_0 + \beta_1 \alpha_0}{1 - \alpha_1 \beta_1} + \frac{\beta_1 \alpha_2}{1 - \alpha_1 \beta_1} Z_i + \frac{\beta_2}{1 - \alpha_1 \beta_1} T_i + \tilde{u}_i \\&= B_0 + B_1 Z_i + B_2 T_i + \tilde{u}_i\end{aligned}\\\begin{aligned}X_i &= \frac{\alpha_0 + \alpha_1 \beta_0}{1 - \alpha_1 \beta_1} + \frac{\alpha_2}{1 - \alpha_1 \beta_1} Z_i + \frac{\alpha_1\beta_2}{1 - \alpha_1 \beta_1} T_i + \tilde{v}_i \\&= A_0 + A_1 Z_i + A_2 T_i + \tilde{v}_i\end{aligned}\end{cases} \]
The reduced form expresses endogenous variables as functions of exogenous instruments, which we can estimate using OLS.
Using reduced-form estimates \((A_1, A_2, B_1, B_2)\), we can identify (recover) the structural coefficients:
\[ \begin{aligned} \beta_1 &= \frac{B_1}{A_1} \\ \beta_2 &= B_2 \left(1 - \frac{B_1 A_2}{A_1 B_2}\right) \\ \alpha_1 &= \frac{A_2}{B_2} \\ \alpha_2 &= A_1 \left(1 - \frac{B_1 A_2}{A_1 B_2} \right) \end{aligned} \]
36.1.2.3 Identification Conditions
Estimation of structural parameters is only possible if the model is identified.
Order Condition (Necessary but Not Sufficient)
A structural equation is identified if:
\[ K - k \ge m - 1 \]
Where:
\(M\) = total number of endogenous variables in the system
\(m\) = number of endogenous variables in the given equation
\(K\) = number of total exogenous variables in the system
\(k\) = number of exogenous variables appearing in the given equation
Just-identified: \(K - k = m - 1\) (exact identification)
Over-identified: \(K - k > m - 1\) (more instruments than necessary)
Under-identified: \(K - k < m - 1\) (cannot be estimated)
Note: The order condition is necessary but not sufficient. The rank condition must also be satisfied for full identification, which we cover in Instrumental Variables.
This simultaneous equations framework provides the foundation for instrumental variable estimation, where:
- Exogenous variables not appearing in a structural equation serve as instruments.
- These instruments allow consistent estimation of endogenous regressors’ effects.
The reduced-form equations are often used to generate fitted values of endogenous regressors, which are then used in a Two-Stage Least Squares estimation process
36.1.3 Reverse Causality
Reverse causality refers to a situation in which the direction of causation is opposite to what is presumed. Specifically, we may model a relationship where variable \(X\) is assumed to cause \(Y\), but in reality, \(Y\) causes \(X\), or both influence each other in a feedback loop.
This violates a fundamental assumption for causal inference: temporal precedence — the cause must come before the effect. In the presence of reverse causality, the relationship between \(X\) and \(Y\) becomes ambiguous, and statistical estimators such as OLS become biased and inconsistent.
In a standard linear regression model:
\[ Y_i = \beta_0 + \beta_1 X_i + u_i \]
We interpret \(\beta_1\) as the causal effect of \(X\) on \(Y\). However, this interpretation implicitly assumes that:
- \(X_i\) is exogenous (uncorrelated with \(u_i\))
- Changes in \(X_i\) occur prior to or independently of changes in \(Y_i\)
If \(Y_i\) also affects \(X_i\), then \(X_i\) is not exogenous — it is endogenous, because it is correlated with \(u_i\) via the reverse causal path.
Reverse causality is especially problematic in observational data where interventions are not randomly assigned. Some key examples include:
- Health and income: Higher income may improve health outcomes, but healthier individuals may also earn more (e.g., due to better productivity or fewer sick days).
- Education and wages: Education raises wages, but higher-income individuals might afford better education — or individuals with higher innate ability (reflected in \(u\)) pursue more education and also earn more.
- Crime and policing: Increased police presence is often assumed to reduce crime, but high-crime areas are also likely to receive more police resources.
- Advertising and sales: Firms advertise more to boost sales, but high sales may also lead to higher advertising budgets — especially when revenue is reinvested in marketing.
To model reverse causality explicitly, consider:
36.1.3.1 System of Equations
\[ \begin{aligned} Y_i &= \beta_0 + \beta_1 X_i + u_i \quad &\text{(Y depends on X)} \\ X_i &= \gamma_0 + \gamma_1 Y_i + v_i \quad &\text{(X depends on Y)} \end{aligned} \]
This feedback loop represents a simultaneous system, but where the causality direction is unclear. The two equations indicate that both variables are endogenous.
Even if we estimate only the first equation using OLS, the bias becomes apparent:
\[ Cov(X_i, u_i) \ne 0 \quad \Rightarrow \quad \hat{\beta}_1 \text{ is biased} \]
Why? Because \(X_i\) is determined by \(Y_i\), which itself depends on \(u_i\). Thus, \(X_i\) indirectly depends on \(u_i\).
In causal diagram notation (Directed Acyclic Graphs, or DAGs), reverse causality violates the acyclicity assumption. Here’s an example:
- Intended model: \(X \rightarrow Y\)
- Reality: \(X \leftrightarrow Y\) (feedback loop)
This non-directional causality prevents us from interpreting coefficients causally unless additional identification strategies are applied.
OLS assumes:
\[ E[u_i \mid X_i] = 0 \]
Under reverse causality, this condition fails. The resulting estimator \(\hat{\beta}_1\) captures both the effect of \(X\) on \(Y\) and the feedback from \(Y\) to \(X\), leading to:
- Omitted variable bias: \(X_i\) captures unobserved information from \(Y_i\)
- Simultaneity bias: caused by the endogenous nature of \(X_i\)
36.1.3.2 Distinction from Simultaneity
Reverse causality is a special case of endogeneity, often manifesting as simultaneity. However, the key distinction is:
- Simultaneity: Variables are determined together (e.g., in equilibrium models), and both are modeled explicitly in a system.
- Reverse causality: Only one equation is estimated, and the true causal direction is unknown or opposite to what is modeled.
Reverse causality may or may not involve a full simultaneous system — it’s often unrecognized or assumed away, making it especially dangerous in empirical research.
There are no mechanical tests that definitively detect reverse causality, but researchers can:
- Use temporal data (lags): Estimate \(Y_{it} = \beta_0 + \beta_1 X_{i,t-1} + u_{it}\) and examine the temporal precedence of variables.
- Apply Granger causality tests in time series (not strictly causal, but helpful diagnostically).
- Use theoretical reasoning to justify directionality.
- Check robustness across different time frames or instrumental variables.
36.1.3.3 Solutions to Reverse Causality
The following methods can mitigate reverse causality:
- Find a variable \(Z\) that affects \(X\) but is not affected by \(Y\), nor correlated with \(u_i\).
- First stage: \(X_i = \pi_0 + \pi_1 Z_i + e_i\)
- Second stage: \(\hat{X}_i\) from the first stage is used in the regression for \(Y\).
- Randomized Controlled Trials (RCTs)
- In experiments, the treatment (e.g., \(X\)) is assigned randomly and therefore exogenous by design.
- Natural Experiments / Quasi-Experimental Designs
- Use external shocks or policy changes that affect \(X\) but not \(Y\) directly (e.g., difference-in-differences, regression discontinuity).
- Panel Data Methods
- Use fixed-effects or difference estimators to eliminate time-invariant confounders.
- Lag independent variables to examine delayed effects and improve causal direction inference.
- Structural Equation Modeling
- Estimate a full system of equations to explicitly model feedback.
36.1.4 Omitted Variable Bias
Omitted Variable Bias (OVB) arises when a relevant explanatory variable that influences the dependent variable is left out of the regression model, and the omitted variable is correlated with one or more included regressors. This violates the exogeneity assumption of OLS and leads to biased and inconsistent estimators.
Suppose we are interested in estimating the effect of an independent variable \(X\) on an outcome \(Y\), and the true data-generating process is:
\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 Z_i + u_i \]
However, if we omit \(Z_i\) and estimate the model:
\[ Y_i = \gamma_0 + \gamma_1 X_i + \varepsilon_i \]
Then the estimate \(\hat{\gamma}_1\) may be biased because \(X_i\) may be correlated with \(Z_i\), and \(Z_i\) influences \(Y_i\).
Let us derive the bias formally.
True model:
\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 Z_i + u_i \quad \text{(1)} \]
Estimated model (with \(Z_i\) omitted):
\[ Y_i = \gamma_0 + \gamma_1 X_i + \varepsilon_i \quad \text{(2)} \]
Now, substitute the true model into the omitted model:
\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 Z_i + u_i = \gamma_0 + \gamma_1 X_i + \varepsilon_i \]
Comparing both models, the omitted variable becomes part of the new error term:
\[ \varepsilon_i = \beta_2 Z_i + u_i \]
Now, consider the OLS assumption:
\[ E[\varepsilon_i \mid X_i] = 0 \quad \text{(OLS requirement)} \]
But since \(\varepsilon_i = \beta_2 Z_i + u_i\) and \(Z_i\) is correlated with \(X_i\), we have:
\[ Cov(X_i, \varepsilon_i) = \beta_2 Cov(X_i, Z_i) \ne 0 \]
Therefore, OLS assumption fails, and \(\hat{\gamma}_1\) is biased.
Let us calculate the expected value of the OLS estimator \(\hat{\gamma}_1\).
From regression theory, when omitting \(Z_i\), the expected value of \(\hat{\gamma}_1\) is:
\[ E[\hat{\gamma}_1] = \beta_1 + \beta_2 \cdot \frac{Cov(X_i, Z_i)}{Var(X_i)} \]
This is the Omitted Variable Bias formula.
Interpretation:
The bias in \(\hat{\gamma}_1\) depends on: - \(\beta_2\): the true effect of the omitted variable on \(Y\) - \(Cov(X_i, Z_i)\): the correlation between \(X\) and the omitted variable \(Z\)
36.1.4.1 Direction of the Bias
- If \(\beta_2 > 0\) and \(Cov(X_i, Z_i) > 0\): \(\hat{\gamma}_1\) is upward biased
- If \(\beta_2 < 0\) and \(Cov(X_i, Z_i) > 0\): \(\hat{\gamma}_1\) is downward biased
- If \(Cov(X_i, Z_i) = 0\): No bias, even if \(Z_i\) is omitted
Note: Uncorrelated omitted variables do not bias the OLS estimator, although they may reduce precision.
36.1.4.2 Practical Example: Education and Earnings
Suppose we model:
\[ \text{Earnings}_i = \gamma_0 + \gamma_1 \cdot \text{Education}_i + \varepsilon_i \]
But the true model includes ability (\(Z_i\)):
\[ \text{Earnings}_i = \beta_0 + \beta_1 \cdot \text{Education}_i + \beta_2 \cdot \text{Ability}_i + u_i \]
Omitting “ability” — a determinant of both education and earnings — leads to bias in the estimated effect of education:
- If more able individuals pursue more education and ability raises earnings (\(\beta_2 > 0\)), then \(\hat{\gamma}_1\) overstates the true return to education.
36.1.4.3 Generalization to Multiple Regression
In models with multiple regressors, omitting a relevant variable that is correlated with at least one included regressor will bias all coefficients affected by the correlation structure.
For example:
\[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i \]
If \(X_2\) is omitted, and \(Cov(X_1, X_2) \ne 0\), then:
\[ E[\hat{\gamma}_1] = \beta_1 + \beta_2 \cdot \frac{Cov(X_1, X_2)}{Var(X_1)} \]
36.1.4.4 Remedies for OVB
- Include the omitted variable
- If \(Z\) is observed, include it in the regression model.
If \(Z\) is unobserved but \(X\) is endogenous, find an instrument \(W\):
- Relevance: \(Cov(W, X) \ne 0\)
- Exogeneity: \(Cov(W, u) = 0\)
- Use Panel Data Methods
- Fixed Effects: eliminate time-invariant omitted variables.
- Difference-in-Differences: exploit temporal variation to isolate effects.
- Randomization ensures omitted variables are orthogonal to treatment, avoiding bias.