36.1 Endogenous Treatment

Endogenous treatment occurs when the variable of interest (the “treatment”) is not randomly assigned and is correlated with unobserved determinants of the outcome. As discussed earlier, this can arise from omitted variables, simultaneity, or reverse causality. But even if the true variable is theoretically exogenous, measurement error can make it endogenous in practice.

This section focuses on how measurement errors, especially in explanatory variables, introduce bias—typically attenuation bias—and why they are a central concern in applied research.


36.1.1 Measurement Errors

Measurement error refers to the difference between the true value of a variable and its observed (measured) value.

  • Sources of measurement error:
    • Coding errors: Manual or software-induced data entry mistakes.
    • Reporting errors: Self-report bias, recall issues, or strategic misreporting.

Two Broad Types of Measurement Error

  1. Random (Stochastic) ErrorClassical Measurement Error
    • Noise is unpredictable and averages out in expectation.
    • Error is uncorrelated with the true variable and the regression error.
    • Common in survey data, tracking errors.
  2. Systematic (Non-classical) ErrorNon-Random Bias
    • Measurement error exhibits consistent patterns across observations.
    • Often arises from:
      • Instrument error: e.g., faulty sensors, uncalibrated scales.
      • Method error: poor sampling, survey design flaws.
      • Human error: judgment errors, social desirability bias.

Key insight:

  • Random error adds noise, pushing estimates toward zero.
  • Systematic error introduces bias, pushing estimates either upward or downward.

36.1.1.1 Classical Measurement Error

36.1.1.1.1 Right-Hand Side Variable

Let’s examine the most common and analytically tractable case: classical measurement error in an explanatory variable.

Suppose the true model is:

Yi=β0+β1Xi+ui

But we do not observe Xi directly. Instead, we observe:

˜Xi=Xi+ei

where ei is the measurement error, assumed classical:

  • E[ei]=0
  • Cov(Xi,ei)=0
  • Cov(ei,ui)=0

Now, substitute ˜Xi into the regression:

Yi=β0+β1(˜Xiei)+ui=β0+β1˜Xi+(uiβ1ei)=β0+β1˜Xi+vi

where vi=uiβ1ei is a composite error term.

Since ˜Xi contains ei, and vi contains ei, we now have:

Cov(˜Xi,vi)0

This correlation violates the exogeneity assumption and introduces endogeneity.


We can derive the asymptotic bias:

E[˜Xivi]=E[(Xi+ei)(uiβ1ei)]=β1Var(ei)0

This implies:

  • If β1>0, then ˆβ1 is biased downward.
  • If β1<0, then ˆβ1 is biased upward.

This is called attenuation bias: the estimated effect is biased toward zero.

As the variance of the error Var(ei) increases or Var(ei)Var(˜Xi)1, this bias becomes more severe.


Attenuation Factor

The OLS estimator based on the noisy regressor is

ˆβOLS=cov(˜X,Y)var(˜X)=cov(X+e,βX+u)var(X+e).

Using the assumptions of classical measurement error, it follows that:

plim ˆβOLS=βσ2Xσ2X+σ2e=βλ,

where:

  • σ2X is the variance of the true regressor X,
  • σ2e is the variance of the measurement error e, and
  • λ=σ2Xσ2X+σ2e is called the reliability ratio, signal-to-total variance ratio, or attenuation factor.

Since λ(0,1], the bias always attenuates the estimate toward zero. The degree of attenuation bias is:

ˆβOLSβ=(1λ)β,

which implies:

  • If λ=1, then ˆβOLS=β — no bias (no measurement error).
  • If λ<1, then ˆβOLS<β — attenuation toward zero.

Important Notes on Measurement Error

  • Data transformations can magnify measurement error.

    Suppose the true model is nonlinear:

    y=βx+γx2+ϵ,

    and x is measured with classical error. Then, the attenuation factor for ˆγ is approximately the square of the attenuation factor for ˆβ:

    λˆγλ2ˆβ.

    This shows how nonlinear transformations (e.g., squares, logs) can exacerbate measurement error problems.

  • Including covariates can increase attenuation bias.

    Adding covariates that are correlated with the mismeasured variable can worsen bias in the coefficient of interest, especially if the measurement error is not accounted for in those covariates.


Remedies for Measurement Error

To address attenuation bias caused by classical measurement error, consider the following strategies:

  1. Use validation data or survey information to estimate σ2X, σ2e, or λ and apply correction methods (e.g., SIMEX, regression calibration).
  2. Instrumental Variables Approach
    Use an instrument Z that:
    • Is correlated with the true variable X,
    • Is uncorrelated with the regression error ϵ, and
    • Is uncorrelated with the measurement error e.
  3. Abandon your project
    If no good instruments or validation data exist, and the attenuation bias is too severe, it may be prudent to reconsider the analysis or research question. (Said with love and academic humility.)

36.1.1.1.2 Left-Hand Side Variable

Measurement error in the dependent variable (i.e., the response or outcome) is fundamentally different from measurement error in explanatory variables. Its consequences are often less problematic for consistent estimation of regression coefficients (e.g., the zero conditional mean assumption is not violated), but not necessarily for statistical inference (e.g., higher standard errors) or model fit.


Suppose we are interested in the standard linear regression model:

Yi=βXi+ui,

but we do not observe Yi directly. Instead, we observe:

˜Yi=Yi+vi,

where:

  • vi is measurement error in the dependent variable,
  • E[vi]=0 (mean-zero),
  • vi is uncorrelated with Xi and ui,
  • vi is homoskedastic and independent across observations.

Be extra careful here!

These are classical‐error assumptions:

  1. Mean zero: E[vX]=0.
  2. Exogeneity: v is uncorrelated with each regressor and with the structural disturbance ϵ (i.e., Cov(X,v)=Cov(ϵ,v)=0).
  3. Homoskedasticity / finite moments for the law‑of‑large‑numbers to apply.

The regression we actually estimate is:

˜Yi=βXi+ui+vi.

We can define a composite error term:

˜ui=ui+vi,

so that the model becomes:

˜Yi=βXi+˜ui.

Under the classical-error assumptions, the extra noise simply enlarges the composite error term ˜ui, leaving

ˆβOLS=β+(XX)1X(u+v)pβ,

so the estimator remains consistent and only its variance rises.


Key Insights

  • Unbiasedness and Consistency of ˆβ:

    As long as E[˜uiXi]=0, which holds under the classical assumptions (i.e., E[uiXi]=0 and E[viXi]=0), the OLS estimator of β remains unbiased and consistent.

    This is because measurement error in the left-hand side does not induce endogeneity. The zero conditional mean assumption is preserved.

  • Interpretation (Why Econometricians Don’t Panic):

    Econometricians and causal researchers often focus on consistent estimation of causal effects under strict exogeneity. Since vi just adds noise to the outcome and doesn’t systematically relate to Xi, the slope estimate ˆβ remains a valid estimate of the causal effect β.

  • Statistical Implications (Why Statisticians Might Worry):

    Although ˆβ is consistent, the variance of the error term increases due to the added noise vi. Specifically:

    Var(˜ui)=Var(ui)+Var(vi)=σ2u+σ2v.

    This leads to:

    • Higher residual variance lower R2
    • Higher standard errors for coefficient estimates
    • Wider confidence intervals, reducing the precision of inference

    Thus, even though the point estimate is valid, inference becomes weaker: hypothesis tests are less powerful, and conclusions less precise.


Practical Illustration

  • Suppose X is a marketing investment and Y is sales revenue.
  • If sales are measured with noise (e.g., misrecorded sales data, rounding, reporting delays), the coefficient on marketing is still consistently estimated.
  • However, uncertainty around the estimate grows: wider confidence intervals might make it harder to detect statistically significant effects, especially in small samples.

Summary Table: Measurement Error Consequences

Location of Measurement Error Bias in ˆβ Consistency Affects Inference? Typical Concern
Regressor (X) Yes (attenuation) No Yes Econometric & statistical
Outcome (Y) No Yes Yes Mainly statistical

36.1.1.2 Non-Classical Measurement Error

In the classical measurement error model, we assume that the measurement error ϵ is independent of the true variable X and of the regression disturbance u. However, in many realistic data scenarios, this assumption does not hold. Non-classical measurement error refers to cases where:

  • ϵ is correlated with X,
  • or possibly even correlated with u.

Violating the classical assumptions introduces additional and potentially complex biases in OLS estimation.


Recall that in the classical measurement error model, we observe:

˜X=X+ϵ,

where:

  • ϵ is independent of X and u,
  • E[ϵ]=0.

The true model is:

Y=βX+u.

Then, OLS based on the mismeasured regressor gives:

ˆβOLS=cov(˜X,Y)var(˜X)=cov(X+ϵ,βX+u)var(X+ϵ).

With classical assumptions, this simplifies to:

plim ˆβOLS=βσ2Xσ2X+σ2ϵ=βλ,

where λ is the reliability ratio, which attenuates ˆβ toward zero.


Let us now relax the independence assumption and allow for correlation between X and ϵ. In particular, suppose:

  • cov(X,ϵ)=σXϵ0.

Then the probability limit of the OLS estimator becomes:

plim ˆβ=cov(X+ϵ,βX+u)var(X+ϵ)=β(σ2X+σXϵ)σ2X+σ2ϵ+2σXϵ.

We can rewrite this as:

plim ˆβ=β(1σ2ϵ+σXϵσ2X+σ2ϵ+2σXϵ)=β(1bϵ˜X),

where bϵ˜X is the regression coefficient of ϵ on ˜X, or more precisely:

bϵ˜X=cov(ϵ,˜X)var(˜X).

This makes clear that the bias in ˆβ depends on how strongly the measurement error is correlated with the observed regressor ˜X. This general formulation nests the classical case as a special case:

  • In classical error: σXϵ=0bϵ˜X=σ2ϵσ2X+σ2ϵ=1λ.

Implications of Non-Classical Measurement Error

  • When σXϵ>0, the attenuation bias can increase or decrease depending on the balance of variances.
  • In particular:
    • If more than half of the variance in ˜X is due to measurement error, increasing σXϵ increases attenuation.
    • If less than half is due to measurement error, it can actually reduce attenuation.
  • This phenomenon is sometimes called mean-reverting measurement error: the measurement error pulls observed values toward the mean, distorting estimates Bound, Brown, and Mathiowetz (2001).

36.1.1.2.1 A General Framework for Non-Classical Measurement Error

Bound, Brown, and Mathiowetz (2001) offer a unified matrix framework that accommodates measurement error in both the independent and dependent variables.

Let the true model be:

Y=Xβ+ϵ,

but we observe ˜X=X+U and ˜Y=Y+v, where:

  • U is a matrix of measurement error in X,
  • v is a vector of measurement error in Y.

Then, the observed model becomes:

ˆβ=(˜X˜X)1˜X˜Y.

Substituting the observed quantities:

˜Y=Y+v=Xβ+ϵ+v,=˜XβUβ+v+ϵ.

Hence,

ˆβ=(˜X˜X)1˜X(˜XβUβ+v+ϵ),

which simplifies to:

ˆβ=β+(˜X˜X)1˜X(Uβ+v+ϵ).

Taking the probability limit:

plim ˆβ=β+plim [(˜X˜X)1˜X(Uβ+v)],

Now define:

W=[Uv],

and we can express the bias compactly as:

plim ˆβ=β+plim [(˜X˜X)1˜XW[β1]].

This formulation highlights a powerful insight:

Bias in ˆβ arises from the linear projection of the measurement errors onto the observed ˜X.

This expression does not assert that v necessarily biases ˆβ; it simply makes explicit that bias arises whenever the linear projection of (Uβv) onto ˜X is non‑zero. Three cases illustrate the point:

Case Key correlation Consequence for ˆβ

Classical Y‑error only

U0,Cov(˜X,v)=0

projection term vanishes Consistent; larger standard errors

Correlated Y‑error

U0,Cov(˜X,v)0

projection picks up v Biased (attenuation or sign reversal possible)

Both X‑ and Y‑error, independent

Cov(X,U)0,Cov(˜X,v)=0

Uβ projects onto ˜X Biased because of U, not v

Hence, your usual “harmless Y-noise” result is the special case in the first row.


Practical implications

  1. Check assumptions explicitly. If the dataset was generated by self‑reports, simultaneous proxies, or modelled outcomes, it is rarely safe to assume Cov(X,v)=0.

  2. Correlated errors in Y can creep in through:

    • Common data‑generating mechanisms (e.g., same survey module records both earnings (Y) and hours worked (X)).
    • Prediction‑generated variables where v inherits correlation with the features used to build ˜Y.
  3. Joint mis‑measurement (U and v correlated) is common in administrative or sensor data; here, even “classical” v with respect to X can correlate with ˜X=X+U.

Measurement error in Y is benign only under strong exogeneity and independence conditions. The Bound–Brown–Mathiowetz matrix form (Bound, Brown, and Mathiowetz 2001) simply shows that once those conditions fail—or once X itself is mis‑measured—the same projection logic that produces attenuation bias for X can also transmit bias from v to ˆβ.

So the rule of thumb you learned is true in its narrow, classical setting, but Bound, Brown, and Mathiowetz (2001) remind us that empirical work often strays outside that safe harbor.


Consequences and Correction

  • Non-classical error can lead to over- or underestimation, unlike the always-attenuating classical case.
  • The direction and magnitude of bias depend on the correlation structure of X, ϵ, and v.
  • This poses serious problems in many survey and administrative data settings where systematic misreporting occurs.

Practical Solutions

  1. Instrumental Variables
    Use an instrument Z that is correlated with the true variable X, but uncorrelated with both measurement error and the regression disturbance. IV can help eliminate both classical and non-classical error-induced biases.

  2. Validation Studies
    Use a subset of the data with accurate measures to estimate the structure of measurement error and correct estimates via techniques such as regression calibration, multiple imputation, or SIMEX.

  3. Modeling the Error Process
    Explicitly model the measurement error process, especially in longitudinal or panel data (e.g., via state-space models or Bayesian approaches).

  4. Binary/Dummy Variable Case
    Non-classical error in binary regressors (e.g., misclassification) also leads to bias, but IV methods still apply. For example, if education level is misreported in survey data, a valid instrument (e.g., policy-based variation) can correct for misclassification bias.


Summary

Feature Classical Error Non-Classical Error
Cov(X,ϵ) 0 0
Bias in ˆβ Always attenuation Can attenuate or inflate
Consistency of OLS No No
Effect of Variance Structure Predictable Depends on σXϵ
Fixable with IV Yes Yes

In short, non-classical measurement error breaks the comforting regularity of attenuation bias. It can produce arbitrary biases depending on the nature and structure of the error. Instrumental variables and validation studies are often the only reliable tools for addressing this complex problem.


36.1.1.3 Solution to Measurement Errors in Correlation Estimation

36.1.1.3.1 Bayesian Correction for Correlation Coefficient

We begin by expressing the Bayesian posterior for a correlation coefficient ρ:

P(ρdata)=P(dataρ)P(ρ)P(data)Posterior ProbabilityLikelihood×Prior Probability

Where:

  • ρ is the true population correlation coefficient
  • P(dataρ) is the likelihood function
  • P(ρ) is the prior density of ρ
  • P(data) is the marginal likelihood (a normalizing constant)

With sample correlation coefficient r:

r=SxySxxSyy

According to Schisterman et al. (2003), pp. 3, the posterior density of ρ can be approximated as:

P(ρx,y)P(ρ)(1ρ2)(n1)/2(1ρr)n3/2

This approximation leads to a posterior that can be modeled via the Fisher transformation:

  • Let ρ=tanh(ξ), where ξN(z,1/n)
  • r=tanh(z) is the Fisher-transformed correlation

Using conjugate normal approximations, we derive the posterior for the transformed correlation ξ as:

  • Posterior Variance:

σ2posterior=1nprior+nlikelihood

  • Posterior Mean:

μposterior=σ2posterior(npriortanh1(rprior)+nlikelihoodtanh1(rlikelihood))

To simplify the mathematics, we may assume a prior of the form:

P(ρ)(1ρ2)c

where c controls the strength of the prior. If no prior information is available, we can set c=0 so that P(ρ)1.


Example: Combining Estimates from Two Studies

Let:

  • Current study: rlikelihood=0.5, nlikelihood=200
  • Prior study: rprior=0.2765, nprior=50205

Step 1: Posterior Variance

σ2posterior=150205+200=0.0000198393

Step 2: Posterior Mean

Apply Fisher transformation:

  • tanh1(0.2765)0.2841
  • tanh1(0.5)=0.5493

Then:

μposterior=0.0000198393×(50205×0.2841+200×0.5493)=0.0000198393×(14260.7+109.86)=0.0000198393×14370.56=0.2850

Thus, the posterior distribution of ξ=tanh1(ρ) is:

ξN(0.2850,0.0000198393)

Transforming back:

  • Posterior mean correlation: ρ=tanh(0.2850)=0.2776
  • 95% CI for ξ: 0.2850±1.960.0000198393=(0.2762,0.2937)
  • Transforming endpoints: tanh(0.2762)=0.2694, tanh(0.2937)=0.2855

The Bayesian posterior distribution for the correlation coefficient is:

  • Mean: ˆρposterior=0.2776
  • 95% CI: (0.2694, 0.2855)

This Bayesian adjustment is especially useful when:

  1. There is high sampling variation due to small sample sizes
  2. Measurement error attenuates the observed correlation
  3. Combining evidence from multiple studies (meta-analytic context)

By leveraging prior information and applying the Fisher transformation, researchers can obtain a more stable and accurate estimate of the true underlying correlation.

# Define inputs
n_new  <- 200
r_new  <- 0.5
alpha  <- 0.05

# Bayesian update function for correlation coefficient
update_correlation <- function(n_new, r_new, alpha) {
  
  # Prior (meta-analysis study)
  n_meta <- 50205
  r_meta <- 0.2765
  
  # Step 1: Posterior variance (in Fisher-z space)
  var_xi <- 1 / (n_new + n_meta)
  
  # Step 2: Posterior mean (in Fisher-z space)
  mu_xi <- var_xi * (n_meta * atanh(r_meta) + n_new * atanh(r_new))
  
  # Step 3: Confidence interval in Fisher-z space
  z_crit    <- qnorm(1 - alpha / 2)
  upper_xi  <- mu_xi + z_crit * sqrt(var_xi)
  lower_xi  <- mu_xi - z_crit * sqrt(var_xi)
  
  # Step 4: Transform back to correlation scale
  mean_rho  <- tanh(mu_xi)
  upper_rho <- tanh(upper_xi)
  lower_rho <- tanh(lower_xi)
  
  # Return all values as a list
  list(
    mu_xi     = mu_xi,
    var_xi    = var_xi,
    upper_xi  = upper_xi,
    lower_xi  = lower_xi,
    mean_rho  = mean_rho,
    upper_rho = upper_rho,
    lower_rho = lower_rho
  )
}


# Run update
updated <-
    update_correlation(n_new = n_new,
                       r_new = r_new,
                       alpha = alpha)

# Display updated posterior mean and confidence interval
cat("Posterior mean of rho:", round(updated$mean_rho, 4), "\n")
#> Posterior mean of rho: 0.2775
cat(
    "95% CI for rho: (",
    round(updated$lower_rho, 4),
    ",",
    round(updated$upper_rho, 4),
    ")\n"
)
#> 95% CI for rho: ( 0.2694 , 0.2855 )

# For comparison: Classical (frequentist) confidence interval around r_new
se_r  <- sqrt(1 / n_new)
z_r   <- qnorm(1 - alpha / 2) * se_r
ci_lo <- r_new - z_r
ci_hi <- r_new + z_r

cat("Frequentist 95% CI for r:",
    round(ci_lo, 4),
    "to",
    round(ci_hi, 4),
    "\n")
#> Frequentist 95% CI for r: 0.3614 to 0.6386

36.1.2 Simultaneity

Simultaneity arises when at least one of the explanatory variables in a regression model is jointly determined with the dependent variable, violating a critical assumption for causal inference: temporal precedence.

Why Simultaneity Matters

  • In classical regression, we assume that regressors are determined exogenously—they are not influenced by the dependent variable.
  • Simultaneity introduces endogeneity, where regressors are correlated with the error term, rendering OLS estimators biased and inconsistent.
  • This has major implications in fields like economics, marketing, finance, and social sciences, where feedback mechanisms or equilibrium processes are common.

Real-World Examples

  • Demand and supply: Price and quantity are determined together in market equilibrium.
  • Sales and advertising: Advertising influences sales, but firms also adjust advertising based on current or anticipated sales.
  • Productivity and investment: Higher productivity may attract investment, but investment can improve productivity.

36.1.2.1 Simultaneous Equation System

We begin with a basic two-equation structural model:

Yi=β0+β1Xi+ui(Structural equation for Y)Xi=α0+α1Yi+vi(Structural equation for X)

Here:

  • Yi and Xi are endogenous variables — both determined within the system.
  • ui and vi are structural error terms, assumed to be uncorrelated with the exogenous variables (if any).

The equations form a simultaneous system because each endogenous variable appears on the right-hand side of the other’s equation.


To uncover the statistical properties of these equations, we solve for Yi and Xi as functions of the error terms only:

Yi=β0+β1α01α1β1+β1vi+ui1α1β1Xi=α0+α1β01α1β1+vi+α1ui1α1β1

These are the reduced-form equations, expressing the endogenous variables as functions of exogenous factors and disturbances.


36.1.2.2 Simultaneity Bias in OLS

If we naïvely estimate the first equation using OLS, assuming Xi is exogenous, we get:

Bias: Cov(Xi,ui)=Cov(vi+α1ui1α1β1,ui)=α11α1β1Var(ui)

This violates the Gauss-Markov Theorem that regressors be uncorrelated with the error term. The OLS estimator for β1 is biased and inconsistent.


To allow for identification and estimation, we introduce exogenous variables:

{Yi=β0+β1Xi+β2Ti+uiXi=α0+α1Yi+α2Zi+vi

Where:

  • Xi, Yiendogenous variables
  • Ti, Ziexogenous variables, not influenced by any variable in the system

Solving this system algebraically yields the reduced form model:

{Yi=β0+β1α01α1β1+β1α21α1β1Zi+β21α1β1Ti+˜ui=B0+B1Zi+B2Ti+˜uiXi=α0+α1β01α1β1+α21α1β1Zi+α1β21α1β1Ti+˜vi=A0+A1Zi+A2Ti+˜vi

The reduced form expresses endogenous variables as functions of exogenous instruments, which we can estimate using OLS.


Using reduced-form estimates (A1,A2,B1,B2), we can identify (recover) the structural coefficients:

β1=B1A1β2=B2(1B1A2A1B2)α1=A2B2α2=A1(1B1A2A1B2)


36.1.2.3 Identification Conditions

Estimation of structural parameters is only possible if the model is identified.

Order Condition (Necessary but Not Sufficient)

A structural equation is identified if:

Kkm1

Where:

  • M = total number of endogenous variables in the system

  • m = number of endogenous variables in the given equation

  • K = number of total exogenous variables in the system

  • k = number of exogenous variables appearing in the given equation

  • Just-identified: Kk=m1 (exact identification)

  • Over-identified: Kk>m1 (more instruments than necessary)

  • Under-identified: Kk<m1 (cannot be estimated)

Note: The order condition is necessary but not sufficient. The rank condition must also be satisfied for full identification, which we cover in Instrumental Variables.

This simultaneous equations framework provides the foundation for instrumental variable estimation, where:

  • Exogenous variables not appearing in a structural equation serve as instruments.
  • These instruments allow consistent estimation of endogenous regressors’ effects.

The reduced-form equations are often used to generate fitted values of endogenous regressors, which are then used in a Two-Stage Least Squares estimation process


36.1.3 Reverse Causality

Reverse causality refers to a situation in which the direction of causation is opposite to what is presumed. Specifically, we may model a relationship where variable X is assumed to cause Y, but in reality, Y causes X, or both influence each other in a feedback loop.

This violates a fundamental assumption for causal inference: temporal precedence — the cause must come before the effect. In the presence of reverse causality, the relationship between X and Y becomes ambiguous, and statistical estimators such as OLS become biased and inconsistent.


In a standard linear regression model:

Yi=β0+β1Xi+ui

We interpret β1 as the causal effect of X on Y. However, this interpretation implicitly assumes that:

  • Xi is exogenous (uncorrelated with ui)
  • Changes in Xi occur prior to or independently of changes in Yi

If Yi also affects Xi, then Xi is not exogenous — it is endogenous, because it is correlated with ui via the reverse causal path.


Reverse causality is especially problematic in observational data where interventions are not randomly assigned. Some key examples include:

  • Health and income: Higher income may improve health outcomes, but healthier individuals may also earn more (e.g., due to better productivity or fewer sick days).
  • Education and wages: Education raises wages, but higher-income individuals might afford better education — or individuals with higher innate ability (reflected in u) pursue more education and also earn more.
  • Crime and policing: Increased police presence is often assumed to reduce crime, but high-crime areas are also likely to receive more police resources.
  • Advertising and sales: Firms advertise more to boost sales, but high sales may also lead to higher advertising budgets — especially when revenue is reinvested in marketing.

To model reverse causality explicitly, consider:

36.1.3.1 System of Equations

Yi=β0+β1Xi+ui(Y depends on X)Xi=γ0+γ1Yi+vi(X depends on Y)

This feedback loop represents a simultaneous system, but where the causality direction is unclear. The two equations indicate that both variables are endogenous.

Even if we estimate only the first equation using OLS, the bias becomes apparent:

Cov(Xi,ui)0ˆβ1 is biased

Why? Because Xi is determined by Yi, which itself depends on ui. Thus, Xi indirectly depends on ui.


In causal diagram notation (Directed Acyclic Graphs, or DAGs), reverse causality violates the acyclicity assumption. Here’s an example:

  • Intended model: XY
  • Reality: XY (feedback loop)

This non-directional causality prevents us from interpreting coefficients causally unless additional identification strategies are applied.


OLS assumes:

E[uiXi]=0

Under reverse causality, this condition fails. The resulting estimator ˆβ1 captures both the effect of X on Y and the feedback from Y to X, leading to:


36.1.3.2 Distinction from Simultaneity

Reverse causality is a special case of endogeneity, often manifesting as simultaneity. However, the key distinction is:

  • Simultaneity: Variables are determined together (e.g., in equilibrium models), and both are modeled explicitly in a system.
  • Reverse causality: Only one equation is estimated, and the true causal direction is unknown or opposite to what is modeled.

Reverse causality may or may not involve a full simultaneous system — it’s often unrecognized or assumed away, making it especially dangerous in empirical research.


There are no mechanical tests that definitively detect reverse causality, but researchers can:

  • Use temporal data (lags): Estimate Yit=β0+β1Xi,t1+uit and examine the temporal precedence of variables.
  • Apply Granger causality tests in time series (not strictly causal, but helpful diagnostically).
  • Use theoretical reasoning to justify directionality.
  • Check robustness across different time frames or instrumental variables.

36.1.3.3 Solutions to Reverse Causality

The following methods can mitigate reverse causality:

  1. Instrumental Variables
  • Find a variable Z that affects X but is not affected by Y, nor correlated with ui.
  • First stage: Xi=π0+π1Zi+ei
  • Second stage: ˆXi from the first stage is used in the regression for Y.
  1. Randomized Controlled Trials (RCTs)
  • In experiments, the treatment (e.g., X) is assigned randomly and therefore exogenous by design.
  1. Natural Experiments / Quasi-Experimental Designs
  • Use external shocks or policy changes that affect X but not Y directly (e.g., difference-in-differences, regression discontinuity).
  1. Panel Data Methods
  • Use fixed-effects or difference estimators to eliminate time-invariant confounders.
  • Lag independent variables to examine delayed effects and improve causal direction inference.
  1. Structural Equation Modeling
  • Estimate a full system of equations to explicitly model feedback.

36.1.4 Omitted Variable Bias

Omitted Variable Bias (OVB) arises when a relevant explanatory variable that influences the dependent variable is left out of the regression model, and the omitted variable is correlated with one or more included regressors. This violates the exogeneity assumption of OLS and leads to biased and inconsistent estimators.

Suppose we are interested in estimating the effect of an independent variable X on an outcome Y, and the true data-generating process is:

Yi=β0+β1Xi+β2Zi+ui

However, if we omit Zi and estimate the model:

Yi=γ0+γ1Xi+εi

Then the estimate ˆγ1 may be biased because Xi may be correlated with Zi, and Zi influences Yi.


Let us derive the bias formally.

True model:

Yi=β0+β1Xi+β2Zi+ui(1)

Estimated model (with Zi omitted):

Yi=γ0+γ1Xi+εi(2)

Now, substitute the true model into the omitted model:

Yi=β0+β1Xi+β2Zi+ui=γ0+γ1Xi+εi

Comparing both models, the omitted variable becomes part of the new error term:

εi=β2Zi+ui

Now, consider the OLS assumption:

E[εiXi]=0(OLS requirement)

But since εi=β2Zi+ui and Zi is correlated with Xi, we have:

Cov(Xi,εi)=β2Cov(Xi,Zi)0

Therefore, OLS assumption fails, and ˆγ1 is biased.


Let us calculate the expected value of the OLS estimator ˆγ1.

From regression theory, when omitting Zi, the expected value of ˆγ1 is:

E[ˆγ1]=β1+β2Cov(Xi,Zi)Var(Xi)

This is the Omitted Variable Bias formula.

Interpretation:
The bias in ˆγ1 depends on: - β2: the true effect of the omitted variable on Y - Cov(Xi,Zi): the correlation between X and the omitted variable Z


36.1.4.1 Direction of the Bias

  • If β2>0 and Cov(Xi,Zi)>0: ˆγ1 is upward biased
  • If β2<0 and Cov(Xi,Zi)>0: ˆγ1 is downward biased
  • If Cov(Xi,Zi)=0: No bias, even if Zi is omitted

Note: Uncorrelated omitted variables do not bias the OLS estimator, although they may reduce precision.


36.1.4.2 Practical Example: Education and Earnings

Suppose we model:

Earningsi=γ0+γ1Educationi+εi

But the true model includes ability (Zi):

Earningsi=β0+β1Educationi+β2Abilityi+ui

Omitting “ability” — a determinant of both education and earnings — leads to bias in the estimated effect of education:

  • If more able individuals pursue more education and ability raises earnings (β2>0), then ˆγ1 overstates the true return to education.

36.1.4.3 Generalization to Multiple Regression

In models with multiple regressors, omitting a relevant variable that is correlated with at least one included regressor will bias all coefficients affected by the correlation structure.

For example:

Yi=β0+β1X1i+β2X2i+ui

If X2 is omitted, and Cov(X1,X2)0, then:

E[ˆγ1]=β1+β2Cov(X1,X2)Var(X1)


36.1.4.4 Remedies for OVB

  1. Include the omitted variable
  • If Z is observed, include it in the regression model.
  1. Use Instrumental Variables
  • If Z is unobserved but X is endogenous, find an instrument W:

    • Relevance: Cov(W,X)0
    • Exogeneity: Cov(W,u)=0
  1. Use Panel Data Methods
  • Fixed Effects: eliminate time-invariant omitted variables.
  • Difference-in-Differences: exploit temporal variation to isolate effects.
  1. Experimental Designs
  • Randomization ensures omitted variables are orthogonal to treatment, avoiding bias.

References

Bound, John, Charles Brown, and Nancy Mathiowetz. 2001. “Measurement Error in Survey Data.” In Handbook of Econometrics, 5:3705–3843. Elsevier.
Schisterman, Enrique F, Kirsten B Moysich, Lucinda J England, and Malla Rao. 2003. “Estimation of the Correlation Coefficient Using the Bayesian Approach and Its Applications for Epidemiologic Research.” BMC Medical Research Methodology 3: 1–4.