36.1 Endogenous Treatment
Endogenous treatment occurs when the variable of interest (the “treatment”) is not randomly assigned and is correlated with unobserved determinants of the outcome. As discussed earlier, this can arise from omitted variables, simultaneity, or reverse causality. But even if the true variable is theoretically exogenous, measurement error can make it endogenous in practice.
This section focuses on how measurement errors, especially in explanatory variables, introduce bias—typically attenuation bias—and why they are a central concern in applied research.
36.1.1 Measurement Errors
Measurement error refers to the difference between the true value of a variable and its observed (measured) value.
- Sources of measurement error:
- Coding errors: Manual or software-induced data entry mistakes.
- Reporting errors: Self-report bias, recall issues, or strategic misreporting.
Two Broad Types of Measurement Error
- Random (Stochastic) Error — Classical Measurement Error
- Noise is unpredictable and averages out in expectation.
- Error is uncorrelated with the true variable and the regression error.
- Common in survey data, tracking errors.
- Systematic (Non-classical) Error — Non-Random Bias
- Measurement error exhibits consistent patterns across observations.
- Often arises from:
- Instrument error: e.g., faulty sensors, uncalibrated scales.
- Method error: poor sampling, survey design flaws.
- Human error: judgment errors, social desirability bias.
Key insight:
- Random error adds noise, pushing estimates toward zero.
- Systematic error introduces bias, pushing estimates either upward or downward.
36.1.1.1 Classical Measurement Error
36.1.1.1.1 Right-Hand Side Variable
Let’s examine the most common and analytically tractable case: classical measurement error in an explanatory variable.
Suppose the true model is:
Yi=β0+β1Xi+ui
But we do not observe Xi directly. Instead, we observe:
˜Xi=Xi+ei
where ei is the measurement error, assumed classical:
- E[ei]=0
- Cov(Xi,ei)=0
- Cov(ei,ui)=0
Now, substitute ˜Xi into the regression:
Yi=β0+β1(˜Xi−ei)+ui=β0+β1˜Xi+(ui−β1ei)=β0+β1˜Xi+vi
where vi=ui−β1ei is a composite error term.
Since ˜Xi contains ei, and vi contains ei, we now have:
Cov(˜Xi,vi)≠0
This correlation violates the exogeneity assumption and introduces endogeneity.
We can derive the asymptotic bias:
E[˜Xivi]=E[(Xi+ei)(ui−β1ei)]=−β1Var(ei)≠0
This implies:
- If β1>0, then ˆβ1 is biased downward.
- If β1<0, then ˆβ1 is biased upward.
This is called attenuation bias: the estimated effect is biased toward zero.
As the variance of the error Var(ei) increases or Var(ei)Var(˜Xi)→1, this bias becomes more severe.
Attenuation Factor
The OLS estimator based on the noisy regressor is
ˆβOLS=cov(˜X,Y)var(˜X)=cov(X+e,βX+u)var(X+e).
Using the assumptions of classical measurement error, it follows that:
plim ˆβOLS=β⋅σ2Xσ2X+σ2e=β⋅λ,
where:
- σ2X is the variance of the true regressor X,
- σ2e is the variance of the measurement error e, and
- λ=σ2Xσ2X+σ2e is called the reliability ratio, signal-to-total variance ratio, or attenuation factor.
Since λ∈(0,1], the bias always attenuates the estimate toward zero. The degree of attenuation bias is:
ˆβOLS−β=−(1−λ)β,
which implies:
- If λ=1, then ˆβOLS=β — no bias (no measurement error).
- If λ<1, then ˆβOLS<β — attenuation toward zero.
Important Notes on Measurement Error
Data transformations can magnify measurement error.
Suppose the true model is nonlinear:
y=βx+γx2+ϵ,
and x is measured with classical error. Then, the attenuation factor for ˆγ is approximately the square of the attenuation factor for ˆβ:
λˆγ≈λ2ˆβ.
This shows how nonlinear transformations (e.g., squares, logs) can exacerbate measurement error problems.
Including covariates can increase attenuation bias.
Adding covariates that are correlated with the mismeasured variable can worsen bias in the coefficient of interest, especially if the measurement error is not accounted for in those covariates.
Remedies for Measurement Error
To address attenuation bias caused by classical measurement error, consider the following strategies:
- Use validation data or survey information to estimate σ2X, σ2e, or λ and apply correction methods (e.g., SIMEX, regression calibration).
- Instrumental Variables Approach
Use an instrument Z that:- Is correlated with the true variable X,
- Is uncorrelated with the regression error ϵ, and
- Is uncorrelated with the measurement error e.
- Abandon your project
If no good instruments or validation data exist, and the attenuation bias is too severe, it may be prudent to reconsider the analysis or research question. (Said with love and academic humility.)
36.1.1.1.2 Left-Hand Side Variable
Measurement error in the dependent variable (i.e., the response or outcome) is fundamentally different from measurement error in explanatory variables. Its consequences are often less problematic for consistent estimation of regression coefficients (e.g., the zero conditional mean assumption is not violated), but not necessarily for statistical inference (e.g., higher standard errors) or model fit.
Suppose we are interested in the standard linear regression model:
Yi=βXi+ui,
but we do not observe Yi directly. Instead, we observe:
˜Yi=Yi+vi,
where:
- vi is measurement error in the dependent variable,
- E[vi]=0 (mean-zero),
- vi is uncorrelated with Xi and ui,
- vi is homoskedastic and independent across observations.
Be extra careful here!
These are classical‐error assumptions:
- Mean zero: E[v∣X]=0.
- Exogeneity: v is uncorrelated with each regressor and with the structural disturbance ϵ (i.e., Cov(X,v)=Cov(ϵ,v)=0).
- Homoskedasticity / finite moments for the law‑of‑large‑numbers to apply.
The regression we actually estimate is:
˜Yi=βXi+ui+vi.
We can define a composite error term:
˜ui=ui+vi,
so that the model becomes:
˜Yi=βXi+˜ui.
Under the classical-error assumptions, the extra noise simply enlarges the composite error term ˜ui, leaving
ˆβOLS=β+(X′X)−1X′(u+v)p→β,
so the estimator remains consistent and only its variance rises.
Key Insights
Unbiasedness and Consistency of ˆβ:
As long as E[˜ui∣Xi]=0, which holds under the classical assumptions (i.e., E[ui∣Xi]=0 and E[vi∣Xi]=0), the OLS estimator of β remains unbiased and consistent.
This is because measurement error in the left-hand side does not induce endogeneity. The zero conditional mean assumption is preserved.
Interpretation (Why Econometricians Don’t Panic):
Econometricians and causal researchers often focus on consistent estimation of causal effects under strict exogeneity. Since vi just adds noise to the outcome and doesn’t systematically relate to Xi, the slope estimate ˆβ remains a valid estimate of the causal effect β.
Statistical Implications (Why Statisticians Might Worry):
Although ˆβ is consistent, the variance of the error term increases due to the added noise vi. Specifically:
Var(˜ui)=Var(ui)+Var(vi)=σ2u+σ2v.
This leads to:
- Higher residual variance ⇒ lower R2
- Higher standard errors for coefficient estimates
- Wider confidence intervals, reducing the precision of inference
Thus, even though the point estimate is valid, inference becomes weaker: hypothesis tests are less powerful, and conclusions less precise.
Practical Illustration
- Suppose X is a marketing investment and Y is sales revenue.
- If sales are measured with noise (e.g., misrecorded sales data, rounding, reporting delays), the coefficient on marketing is still consistently estimated.
- However, uncertainty around the estimate grows: wider confidence intervals might make it harder to detect statistically significant effects, especially in small samples.
Summary Table: Measurement Error Consequences
Location of Measurement Error | Bias in ˆβ | Consistency | Affects Inference? | Typical Concern |
---|---|---|---|---|
Regressor (X) | Yes (attenuation) | No | Yes | Econometric & statistical |
Outcome (Y) | No | Yes | Yes | Mainly statistical |
36.1.1.2 Non-Classical Measurement Error
In the classical measurement error model, we assume that the measurement error ϵ is independent of the true variable X and of the regression disturbance u. However, in many realistic data scenarios, this assumption does not hold. Non-classical measurement error refers to cases where:
- ϵ is correlated with X,
- or possibly even correlated with u.
Violating the classical assumptions introduces additional and potentially complex biases in OLS estimation.
Recall that in the classical measurement error model, we observe:
˜X=X+ϵ,
where:
- ϵ is independent of X and u,
- E[ϵ]=0.
The true model is:
Y=βX+u.
Then, OLS based on the mismeasured regressor gives:
ˆβOLS=cov(˜X,Y)var(˜X)=cov(X+ϵ,βX+u)var(X+ϵ).
With classical assumptions, this simplifies to:
plim ˆβOLS=β⋅σ2Xσ2X+σ2ϵ=β⋅λ,
where λ is the reliability ratio, which attenuates ˆβ toward zero.
Let us now relax the independence assumption and allow for correlation between X and ϵ. In particular, suppose:
- cov(X,ϵ)=σXϵ≠0.
Then the probability limit of the OLS estimator becomes:
plim ˆβ=cov(X+ϵ,βX+u)var(X+ϵ)=β(σ2X+σXϵ)σ2X+σ2ϵ+2σXϵ.
We can rewrite this as:
plim ˆβ=β(1−σ2ϵ+σXϵσ2X+σ2ϵ+2σXϵ)=β(1−bϵ˜X),
where bϵ˜X is the regression coefficient of ϵ on ˜X, or more precisely:
bϵ˜X=cov(ϵ,˜X)var(˜X).
This makes clear that the bias in ˆβ depends on how strongly the measurement error is correlated with the observed regressor ˜X. This general formulation nests the classical case as a special case:
- In classical error: σXϵ=0⇒bϵ˜X=σ2ϵσ2X+σ2ϵ=1−λ.
Implications of Non-Classical Measurement Error
- When σXϵ>0, the attenuation bias can increase or decrease depending on the balance of variances.
- In particular:
- If more than half of the variance in ˜X is due to measurement error, increasing σXϵ increases attenuation.
- If less than half is due to measurement error, it can actually reduce attenuation.
- This phenomenon is sometimes called mean-reverting measurement error: the measurement error pulls observed values toward the mean, distorting estimates Bound, Brown, and Mathiowetz (2001).
36.1.1.2.1 A General Framework for Non-Classical Measurement Error
Bound, Brown, and Mathiowetz (2001) offer a unified matrix framework that accommodates measurement error in both the independent and dependent variables.
Let the true model be:
Y=Xβ+ϵ,
but we observe ˜X=X+U and ˜Y=Y+v, where:
- U is a matrix of measurement error in X,
- v is a vector of measurement error in Y.
Then, the observed model becomes:
ˆβ=(˜X′˜X)−1˜X′˜Y.
Substituting the observed quantities:
˜Y=Y+v=Xβ+ϵ+v,=˜Xβ−Uβ+v+ϵ.
Hence,
ˆβ=(˜X′˜X)−1˜X′(˜Xβ−Uβ+v+ϵ),
which simplifies to:
ˆβ=β+(˜X′˜X)−1˜X′(−Uβ+v+ϵ).
Taking the probability limit:
plim ˆβ=β+plim [(˜X′˜X)−1˜X′(−Uβ+v)],
Now define:
W=[Uv],
and we can express the bias compactly as:
plim ˆβ=β+plim [(˜X′˜X)−1˜X′W[−β1]].
This formulation highlights a powerful insight:
Bias in ˆβ arises from the linear projection of the measurement errors onto the observed ˜X.
This expression does not assert that v necessarily biases ˆβ; it simply makes explicit that bias arises whenever the linear projection of (Uβ−v) onto ˜X is non‑zero. Three cases illustrate the point:
Case | Key correlation | Consequence for ˆβ |
U≡0,Cov(˜X,v)=0 |
projection term vanishes | Consistent; larger standard errors |
Correlated Y‑error U≡0,Cov(˜X,v)≠0 |
projection picks up v | Biased (attenuation or sign reversal possible) |
Both X‑ and Y‑error, independent Cov(X,U)≠0,Cov(˜X,v)=0 |
Uβ projects onto ˜X | Biased because of U, not v |
Hence, your usual “harmless Y-noise” result is the special case in the first row.
Practical implications
Check assumptions explicitly. If the dataset was generated by self‑reports, simultaneous proxies, or modelled outcomes, it is rarely safe to assume Cov(X,v)=0.
Correlated errors in Y can creep in through:
- Common data‑generating mechanisms (e.g., same survey module records both earnings (Y) and hours worked (X)).
- Prediction‑generated variables where v inherits correlation with the features used to build ˜Y.
Joint mis‑measurement (U and v correlated) is common in administrative or sensor data; here, even “classical” v with respect to X can correlate with ˜X=X+U.
Measurement error in Y is benign only under strong exogeneity and independence conditions. The Bound–Brown–Mathiowetz matrix form (Bound, Brown, and Mathiowetz 2001) simply shows that once those conditions fail—or once X itself is mis‑measured—the same projection logic that produces attenuation bias for X can also transmit bias from v to ˆβ.
So the rule of thumb you learned is true in its narrow, classical setting, but Bound, Brown, and Mathiowetz (2001) remind us that empirical work often strays outside that safe harbor.
Consequences and Correction
- Non-classical error can lead to over- or underestimation, unlike the always-attenuating classical case.
- The direction and magnitude of bias depend on the correlation structure of X, ϵ, and v.
- This poses serious problems in many survey and administrative data settings where systematic misreporting occurs.
Practical Solutions
Instrumental Variables
Use an instrument Z that is correlated with the true variable X, but uncorrelated with both measurement error and the regression disturbance. IV can help eliminate both classical and non-classical error-induced biases.Validation Studies
Use a subset of the data with accurate measures to estimate the structure of measurement error and correct estimates via techniques such as regression calibration, multiple imputation, or SIMEX.Modeling the Error Process
Explicitly model the measurement error process, especially in longitudinal or panel data (e.g., via state-space models or Bayesian approaches).Binary/Dummy Variable Case
Non-classical error in binary regressors (e.g., misclassification) also leads to bias, but IV methods still apply. For example, if education level is misreported in survey data, a valid instrument (e.g., policy-based variation) can correct for misclassification bias.
Summary
Feature | Classical Error | Non-Classical Error |
---|---|---|
Cov(X,ϵ) | 0 | ≠0 |
Bias in ˆβ | Always attenuation | Can attenuate or inflate |
Consistency of OLS | No | No |
Effect of Variance Structure | Predictable | Depends on σXϵ |
Fixable with IV | Yes | Yes |
In short, non-classical measurement error breaks the comforting regularity of attenuation bias. It can produce arbitrary biases depending on the nature and structure of the error. Instrumental variables and validation studies are often the only reliable tools for addressing this complex problem.
36.1.1.3 Solution to Measurement Errors in Correlation Estimation
36.1.1.3.1 Bayesian Correction for Correlation Coefficient
We begin by expressing the Bayesian posterior for a correlation coefficient ρ:
P(ρ∣data)=P(data∣ρ)P(ρ)P(data)Posterior Probability∝Likelihood×Prior Probability
Where:
- ρ is the true population correlation coefficient
- P(data∣ρ) is the likelihood function
- P(ρ) is the prior density of ρ
- P(data) is the marginal likelihood (a normalizing constant)
With sample correlation coefficient r:
r=Sxy√SxxSyy
According to Schisterman et al. (2003), pp. 3, the posterior density of ρ can be approximated as:
P(ρ∣x,y)∝P(ρ)⋅(1−ρ2)(n−1)/2(1−ρr)n−3/2
This approximation leads to a posterior that can be modeled via the Fisher transformation:
- Let ρ=tanh(ξ), where ξ∼N(z,1/n)
- r=tanh(z) is the Fisher-transformed correlation
Using conjugate normal approximations, we derive the posterior for the transformed correlation ξ as:
- Posterior Variance:
σ2posterior=1nprior+nlikelihood
- Posterior Mean:
μposterior=σ2posterior(nprior⋅tanh−1(rprior)+nlikelihood⋅tanh−1(rlikelihood))
To simplify the mathematics, we may assume a prior of the form:
P(ρ)∝(1−ρ2)c
where c controls the strength of the prior. If no prior information is available, we can set c=0 so that P(ρ)∝1.
Example: Combining Estimates from Two Studies
Let:
- Current study: rlikelihood=0.5, nlikelihood=200
- Prior study: rprior=0.2765, nprior=50205
Step 1: Posterior Variance
σ2posterior=150205+200=0.0000198393
Step 2: Posterior Mean
Apply Fisher transformation:
- tanh−1(0.2765)≈0.2841
- tanh−1(0.5)=0.5493
Then:
μposterior=0.0000198393×(50205×0.2841+200×0.5493)=0.0000198393×(14260.7+109.86)=0.0000198393×14370.56=0.2850
Thus, the posterior distribution of ξ=tanh−1(ρ) is:
ξ∼N(0.2850,0.0000198393)
Transforming back:
- Posterior mean correlation: ρ=tanh(0.2850)=0.2776
- 95% CI for ξ: 0.2850±1.96⋅√0.0000198393=(0.2762,0.2937)
- Transforming endpoints: tanh(0.2762)=0.2694, tanh(0.2937)=0.2855
The Bayesian posterior distribution for the correlation coefficient is:
- Mean: ˆρposterior=0.2776
- 95% CI: (0.2694, 0.2855)
This Bayesian adjustment is especially useful when:
- There is high sampling variation due to small sample sizes
- Measurement error attenuates the observed correlation
- Combining evidence from multiple studies (meta-analytic context)
By leveraging prior information and applying the Fisher transformation, researchers can obtain a more stable and accurate estimate of the true underlying correlation.
# Define inputs
n_new <- 200
r_new <- 0.5
alpha <- 0.05
# Bayesian update function for correlation coefficient
update_correlation <- function(n_new, r_new, alpha) {
# Prior (meta-analysis study)
n_meta <- 50205
r_meta <- 0.2765
# Step 1: Posterior variance (in Fisher-z space)
var_xi <- 1 / (n_new + n_meta)
# Step 2: Posterior mean (in Fisher-z space)
mu_xi <- var_xi * (n_meta * atanh(r_meta) + n_new * atanh(r_new))
# Step 3: Confidence interval in Fisher-z space
z_crit <- qnorm(1 - alpha / 2)
upper_xi <- mu_xi + z_crit * sqrt(var_xi)
lower_xi <- mu_xi - z_crit * sqrt(var_xi)
# Step 4: Transform back to correlation scale
mean_rho <- tanh(mu_xi)
upper_rho <- tanh(upper_xi)
lower_rho <- tanh(lower_xi)
# Return all values as a list
list(
mu_xi = mu_xi,
var_xi = var_xi,
upper_xi = upper_xi,
lower_xi = lower_xi,
mean_rho = mean_rho,
upper_rho = upper_rho,
lower_rho = lower_rho
)
}
# Run update
updated <-
update_correlation(n_new = n_new,
r_new = r_new,
alpha = alpha)
# Display updated posterior mean and confidence interval
cat("Posterior mean of rho:", round(updated$mean_rho, 4), "\n")
#> Posterior mean of rho: 0.2775
cat(
"95% CI for rho: (",
round(updated$lower_rho, 4),
",",
round(updated$upper_rho, 4),
")\n"
)
#> 95% CI for rho: ( 0.2694 , 0.2855 )
# For comparison: Classical (frequentist) confidence interval around r_new
se_r <- sqrt(1 / n_new)
z_r <- qnorm(1 - alpha / 2) * se_r
ci_lo <- r_new - z_r
ci_hi <- r_new + z_r
cat("Frequentist 95% CI for r:",
round(ci_lo, 4),
"to",
round(ci_hi, 4),
"\n")
#> Frequentist 95% CI for r: 0.3614 to 0.6386
36.1.2 Simultaneity
Simultaneity arises when at least one of the explanatory variables in a regression model is jointly determined with the dependent variable, violating a critical assumption for causal inference: temporal precedence.
Why Simultaneity Matters
- In classical regression, we assume that regressors are determined exogenously—they are not influenced by the dependent variable.
- Simultaneity introduces endogeneity, where regressors are correlated with the error term, rendering OLS estimators biased and inconsistent.
- This has major implications in fields like economics, marketing, finance, and social sciences, where feedback mechanisms or equilibrium processes are common.
Real-World Examples
- Demand and supply: Price and quantity are determined together in market equilibrium.
- Sales and advertising: Advertising influences sales, but firms also adjust advertising based on current or anticipated sales.
- Productivity and investment: Higher productivity may attract investment, but investment can improve productivity.
36.1.2.1 Simultaneous Equation System
We begin with a basic two-equation structural model:
Yi=β0+β1Xi+ui(Structural equation for Y)Xi=α0+α1Yi+vi(Structural equation for X)
Here:
- Yi and Xi are endogenous variables — both determined within the system.
- ui and vi are structural error terms, assumed to be uncorrelated with the exogenous variables (if any).
The equations form a simultaneous system because each endogenous variable appears on the right-hand side of the other’s equation.
To uncover the statistical properties of these equations, we solve for Yi and Xi as functions of the error terms only:
Yi=β0+β1α01−α1β1+β1vi+ui1−α1β1Xi=α0+α1β01−α1β1+vi+α1ui1−α1β1
These are the reduced-form equations, expressing the endogenous variables as functions of exogenous factors and disturbances.
36.1.2.2 Simultaneity Bias in OLS
If we naïvely estimate the first equation using OLS, assuming Xi is exogenous, we get:
Bias: Cov(Xi,ui)=Cov(vi+α1ui1−α1β1,ui)=α11−α1β1⋅Var(ui)
This violates the Gauss-Markov Theorem that regressors be uncorrelated with the error term. The OLS estimator for β1 is biased and inconsistent.
To allow for identification and estimation, we introduce exogenous variables:
{Yi=β0+β1Xi+β2Ti+uiXi=α0+α1Yi+α2Zi+vi
Where:
- Xi, Yi — endogenous variables
- Ti, Zi — exogenous variables, not influenced by any variable in the system
Solving this system algebraically yields the reduced form model:
{Yi=β0+β1α01−α1β1+β1α21−α1β1Zi+β21−α1β1Ti+˜ui=B0+B1Zi+B2Ti+˜uiXi=α0+α1β01−α1β1+α21−α1β1Zi+α1β21−α1β1Ti+˜vi=A0+A1Zi+A2Ti+˜vi
The reduced form expresses endogenous variables as functions of exogenous instruments, which we can estimate using OLS.
Using reduced-form estimates (A1,A2,B1,B2), we can identify (recover) the structural coefficients:
β1=B1A1β2=B2(1−B1A2A1B2)α1=A2B2α2=A1(1−B1A2A1B2)
36.1.2.3 Identification Conditions
Estimation of structural parameters is only possible if the model is identified.
Order Condition (Necessary but Not Sufficient)
A structural equation is identified if:
K−k≥m−1
Where:
M = total number of endogenous variables in the system
m = number of endogenous variables in the given equation
K = number of total exogenous variables in the system
k = number of exogenous variables appearing in the given equation
Just-identified: K−k=m−1 (exact identification)
Over-identified: K−k>m−1 (more instruments than necessary)
Under-identified: K−k<m−1 (cannot be estimated)
Note: The order condition is necessary but not sufficient. The rank condition must also be satisfied for full identification, which we cover in Instrumental Variables.
This simultaneous equations framework provides the foundation for instrumental variable estimation, where:
- Exogenous variables not appearing in a structural equation serve as instruments.
- These instruments allow consistent estimation of endogenous regressors’ effects.
The reduced-form equations are often used to generate fitted values of endogenous regressors, which are then used in a Two-Stage Least Squares estimation process
36.1.3 Reverse Causality
Reverse causality refers to a situation in which the direction of causation is opposite to what is presumed. Specifically, we may model a relationship where variable X is assumed to cause Y, but in reality, Y causes X, or both influence each other in a feedback loop.
This violates a fundamental assumption for causal inference: temporal precedence — the cause must come before the effect. In the presence of reverse causality, the relationship between X and Y becomes ambiguous, and statistical estimators such as OLS become biased and inconsistent.
In a standard linear regression model:
Yi=β0+β1Xi+ui
We interpret β1 as the causal effect of X on Y. However, this interpretation implicitly assumes that:
- Xi is exogenous (uncorrelated with ui)
- Changes in Xi occur prior to or independently of changes in Yi
If Yi also affects Xi, then Xi is not exogenous — it is endogenous, because it is correlated with ui via the reverse causal path.
Reverse causality is especially problematic in observational data where interventions are not randomly assigned. Some key examples include:
- Health and income: Higher income may improve health outcomes, but healthier individuals may also earn more (e.g., due to better productivity or fewer sick days).
- Education and wages: Education raises wages, but higher-income individuals might afford better education — or individuals with higher innate ability (reflected in u) pursue more education and also earn more.
- Crime and policing: Increased police presence is often assumed to reduce crime, but high-crime areas are also likely to receive more police resources.
- Advertising and sales: Firms advertise more to boost sales, but high sales may also lead to higher advertising budgets — especially when revenue is reinvested in marketing.
To model reverse causality explicitly, consider:
36.1.3.1 System of Equations
Yi=β0+β1Xi+ui(Y depends on X)Xi=γ0+γ1Yi+vi(X depends on Y)
This feedback loop represents a simultaneous system, but where the causality direction is unclear. The two equations indicate that both variables are endogenous.
Even if we estimate only the first equation using OLS, the bias becomes apparent:
Cov(Xi,ui)≠0⇒ˆβ1 is biased
Why? Because Xi is determined by Yi, which itself depends on ui. Thus, Xi indirectly depends on ui.
In causal diagram notation (Directed Acyclic Graphs, or DAGs), reverse causality violates the acyclicity assumption. Here’s an example:
- Intended model: X→Y
- Reality: X↔Y (feedback loop)
This non-directional causality prevents us from interpreting coefficients causally unless additional identification strategies are applied.
OLS assumes:
E[ui∣Xi]=0
Under reverse causality, this condition fails. The resulting estimator ˆβ1 captures both the effect of X on Y and the feedback from Y to X, leading to:
- Omitted variable bias: Xi captures unobserved information from Yi
- Simultaneity bias: caused by the endogenous nature of Xi
36.1.3.2 Distinction from Simultaneity
Reverse causality is a special case of endogeneity, often manifesting as simultaneity. However, the key distinction is:
- Simultaneity: Variables are determined together (e.g., in equilibrium models), and both are modeled explicitly in a system.
- Reverse causality: Only one equation is estimated, and the true causal direction is unknown or opposite to what is modeled.
Reverse causality may or may not involve a full simultaneous system — it’s often unrecognized or assumed away, making it especially dangerous in empirical research.
There are no mechanical tests that definitively detect reverse causality, but researchers can:
- Use temporal data (lags): Estimate Yit=β0+β1Xi,t−1+uit and examine the temporal precedence of variables.
- Apply Granger causality tests in time series (not strictly causal, but helpful diagnostically).
- Use theoretical reasoning to justify directionality.
- Check robustness across different time frames or instrumental variables.
36.1.3.3 Solutions to Reverse Causality
The following methods can mitigate reverse causality:
- Find a variable Z that affects X but is not affected by Y, nor correlated with ui.
- First stage: Xi=π0+π1Zi+ei
- Second stage: ˆXi from the first stage is used in the regression for Y.
- Randomized Controlled Trials (RCTs)
- In experiments, the treatment (e.g., X) is assigned randomly and therefore exogenous by design.
- Natural Experiments / Quasi-Experimental Designs
- Use external shocks or policy changes that affect X but not Y directly (e.g., difference-in-differences, regression discontinuity).
- Panel Data Methods
- Use fixed-effects or difference estimators to eliminate time-invariant confounders.
- Lag independent variables to examine delayed effects and improve causal direction inference.
- Structural Equation Modeling
- Estimate a full system of equations to explicitly model feedback.
36.1.4 Omitted Variable Bias
Omitted Variable Bias (OVB) arises when a relevant explanatory variable that influences the dependent variable is left out of the regression model, and the omitted variable is correlated with one or more included regressors. This violates the exogeneity assumption of OLS and leads to biased and inconsistent estimators.
Suppose we are interested in estimating the effect of an independent variable X on an outcome Y, and the true data-generating process is:
Yi=β0+β1Xi+β2Zi+ui
However, if we omit Zi and estimate the model:
Yi=γ0+γ1Xi+εi
Then the estimate ˆγ1 may be biased because Xi may be correlated with Zi, and Zi influences Yi.
Let us derive the bias formally.
True model:
Yi=β0+β1Xi+β2Zi+ui(1)
Estimated model (with Zi omitted):
Yi=γ0+γ1Xi+εi(2)
Now, substitute the true model into the omitted model:
Yi=β0+β1Xi+β2Zi+ui=γ0+γ1Xi+εi
Comparing both models, the omitted variable becomes part of the new error term:
εi=β2Zi+ui
Now, consider the OLS assumption:
E[εi∣Xi]=0(OLS requirement)
But since εi=β2Zi+ui and Zi is correlated with Xi, we have:
Cov(Xi,εi)=β2Cov(Xi,Zi)≠0
Therefore, OLS assumption fails, and ˆγ1 is biased.
Let us calculate the expected value of the OLS estimator ˆγ1.
From regression theory, when omitting Zi, the expected value of ˆγ1 is:
E[ˆγ1]=β1+β2⋅Cov(Xi,Zi)Var(Xi)
This is the Omitted Variable Bias formula.
Interpretation:
The bias in ˆγ1 depends on: - β2: the true effect of the omitted variable on Y - Cov(Xi,Zi): the correlation between X and the omitted variable Z
36.1.4.1 Direction of the Bias
- If β2>0 and Cov(Xi,Zi)>0: ˆγ1 is upward biased
- If β2<0 and Cov(Xi,Zi)>0: ˆγ1 is downward biased
- If Cov(Xi,Zi)=0: No bias, even if Zi is omitted
Note: Uncorrelated omitted variables do not bias the OLS estimator, although they may reduce precision.
36.1.4.2 Practical Example: Education and Earnings
Suppose we model:
Earningsi=γ0+γ1⋅Educationi+εi
But the true model includes ability (Zi):
Earningsi=β0+β1⋅Educationi+β2⋅Abilityi+ui
Omitting “ability” — a determinant of both education and earnings — leads to bias in the estimated effect of education:
- If more able individuals pursue more education and ability raises earnings (β2>0), then ˆγ1 overstates the true return to education.
36.1.4.3 Generalization to Multiple Regression
In models with multiple regressors, omitting a relevant variable that is correlated with at least one included regressor will bias all coefficients affected by the correlation structure.
For example:
Yi=β0+β1X1i+β2X2i+ui
If X2 is omitted, and Cov(X1,X2)≠0, then:
E[ˆγ1]=β1+β2⋅Cov(X1,X2)Var(X1)
36.1.4.4 Remedies for OVB
- Include the omitted variable
- If Z is observed, include it in the regression model.
If Z is unobserved but X is endogenous, find an instrument W:
- Relevance: Cov(W,X)≠0
- Exogeneity: Cov(W,u)=0
- Use Panel Data Methods
- Fixed Effects: eliminate time-invariant omitted variables.
- Difference-in-Differences: exploit temporal variation to isolate effects.
- Randomization ensures omitted variables are orthogonal to treatment, avoiding bias.