34.8 Types of IV

34.8.1 Treatment Intensity

Two-Stage Least Squares is a powerful method for estimating the average causal effect when treatment intensity varies across units. Rather than simple binary treatment (treated vs. untreated), many empirical applications involve treatments that can take on a range of values. TSLS can identify causal effects in these settings, capturing “a weighted average of per-unit treatment effects along the length of a causal response function” (J. D. Angrist and Imbens 1995, 431).

Common examples of treatment intensity include:

The average causal effect here refers to the conditional expectation of the difference in outcomes between the treated unit (at a given treatment intensity) and what would have happened in the counterfactual scenario (at a different treatment intensity). Importantly:

  • Linearity is not required in the relationships between the dependent variable, treatment intensities, and instruments. TSLS can accommodate nonlinear causal response functions, provided the assumptions of the method hold.

34.8.1.1 Example: Schooling and Earnings

In their seminal paper, J. D. Angrist and Imbens (1995) estimate the causal effect of years of schooling on earnings, using quarter of birth as an instrumental variable. The intuition is that individuals born in different quarters are subject to different compulsory schooling laws, which affect educational attainment but are plausibly unrelated to unobserved ability or motivation (the typical omitted variables in this context).

The structural outcome equation is:

Y=γ0+γ1X1+ρS+ε

where:

  • Y is the log of earnings (the dependent variable)
  • S is years of schooling (the endogenous regressor)
  • X1 is a vector (or matrix) of exogenous covariates (e.g., demographic characteristics)
  • ρ is the causal return to schooling we wish to estimate
  • ε is the error term, capturing unobserved factors

Because schooling S may be endogenous (e.g., correlated with ε), we model its first-stage relationship with the exogenous variables and instruments:

S=δ0+X1δ1+X2δ2+η

where:

  • X2 represents the instrumental variables (e.g., quarter of birth)
  • δ2 is the coefficient on the instrument
  • η is the first-stage error term

The Two-Stage Procedure

  1. First Stage Regression
    Regress S on X1 and X2 to obtain the predicted (fitted) values ˆS.

ˆS=^δ0+X1^δ1+X2^δ2

  1. Second Stage Regression
    Replace S with ˆS in the structural equation and estimate:

Y=γ0+γ1X1+ρˆS+ν

where ν is the new error term (different from ε because ˆS is constructed to be exogenous).

Under the standard IV assumptions, ρ is a consistent estimator of the causal effect of schooling on earnings.


34.8.1.2 Causal Interpretation of ρ

For ρ to have a valid causal interpretation, two key assumptions are essential:

  1. SUTVA (Stable Unit Treatment Value Assumption)
    • The potential outcomes of each individual are not affected by the treatment assignments of other units.
    • There are no hidden variations of the treatment; “one year of schooling” means the same treatment type across individuals.
    • While important, SUTVA is often assumed without extensive defense in empirical work, though violations (e.g., spillovers in education settings) should be acknowledged when plausible.
  2. [Local Average Treatment Effect] (LATE)
    • TSLS identifies a weighted average of marginal effects at the points where the instrument induces variation in treatment intensity.
    • Formally, ρ converges in probability to a weighted average of causal increments:

plim ˆρ=jwjE[YjYj1Compliers at level j]

where wj are weights determined by the distribution of the instrument and treatment intensity.

  • This LATE interpretation means that TSLS estimates apply to compliers whose treatment intensity changes in response to the instrument. If there is heterogeneity in treatment effects across units, the interpretation of ρ becomes instrument-dependent and may not generalize to the entire population.

34.8.2 Decision-Maker IV

Examiner designs, judge IV designs, and leniency IV refer to a family of instrumental variable strategies that exploit quasi-random assignment of decision-makers (such as judges or examiners) to observational units. These designs are used to identify causal effects in settings where controlled experiments are not feasible.

Examiner/judge IV design is an approach where the instrument is the identity or behavior of an assigned decision-maker (an “examiner” or a judge). The classic setup arises in courts: cases are typically assigned to judges in a manner that is as good as random (often conditional on timing or location), and different judges have systematically different propensities to rule harshly or leniently. This means that, purely by the luck of the draw, otherwise-similar individuals may receive different treatments (e.g. a longer vs. shorter sentence) depending on which judge they happen to get. In such a design, the judge assignment (or a function of it) serves as an instrumental variable for the treatment of interest (like sentence length). The key insight is that who the examiner/judge is can be treated as an exogenous shock that influences the treatment but is (ideally) unrelated to the person’s own characteristics.

The term judge IV design specifically refers to using judges in legal settings as instruments. This approach rose to prominence through studies of the criminal justice system; a well-known early example is Kling (2006), who used randomly assigned judges to instrument for incarceration length when studying its effect on later earnings. More generally, the literature often calls this the “judge leniency” design, because it leverages differences in judges’ leniency/harshness. Importantly, the same idea extends beyond literal judges. Examiners in various administrative or medical contexts can play an analogous role. For instance, bureaucrats evaluating benefit claims, patent examiners reviewing applications, or physicians making discretionary treatment decisions can all act like “judges” whose assignment is as-good-as-random and whose leniency varies. In these non-court contexts, researchers sometimes use the term examiner design as a more general label, but it is essentially the same IV strategy. In summary, whether we say examiner design, judge IV, or judge leniency IV, we are usually referring to the same identification strategy – using the quasi-random assignment of a decision-maker with varying tendencies as an instrument.

How the design is structured: In practice, one can implement this IV in a couple of ways. One method is to include dummy variables for each judge/examiner as instruments (since each judge is a distinct source of variation). Another common approach is to construct a leniency measure for each decision-maker – for example, the judge’s historical rate of granting the treatment – and use that as a single continuous instrument. The latter approach (using a summary measure of leniency) is popular because it reduces dimensionality and mitigates weak-instrument concerns when there are many judges. For instance, instead of having 50 separate judge dummies, one can calculate each judge’s leave-one-out approval or sentencing rate and use that number as the instrument. This “leave-one-out” or jackknife approach ensures the measure for each judge is calculated excluding the case in question (avoiding mechanical endogeneity). Overall, the examiner/judge IV design turns the naturally occurring randomness in examiner assignment into a source of exogenous variation: who you were randomly assigned to becomes the instrument for the treatment you received.

34.8.2.1 Achieving Identification with a Leniency IV

The examiner/judge design is a powerful way to achieve identification in observational data. It rests on the core requirements for a valid instrumental variable:

  • Quasi-Random Assignment (Exogeneity): Because examiners or judges are assigned to cases essentially at random (often by rotation, scheduling, or lottery), the particular decision-maker an individual gets is independent of that individual’s characteristics. This approximates the randomness of an experiment. As long as assignment is truly random (or as-good-as-random after conditioning on any known factors like time or location), the examiner identity is uncorrelated with unobserved confounders. In other words, which judge you draw should have no direct bearing on your outcome except through the judge’s decision. This satisfies the exogeneity condition for an IV.

  • Instrument Relevance: Different examiners have different propensities to deliver the treatment. Some judges are more severe (more likely to incarcerate or give long sentences), while others are more lenient; some doctors are more likely to prescribe an intensive treatment, etc. This translates into substantial variation in the probability of treatment based solely on who the case was assigned to. For example, in the patent context, being assigned a lenient patent examiner vs. a strict one can significantly change the probability of a patent grant (Farre-Mensa, Hegde, and Ljungqvist 2020).

  • Exclusion Restriction: The IV assumption is that the assigned examiner affects the outcome only through the treatment itself. In a judge design, this means the “type of judge you are assigned” should impact the defendant’s future outcomes solely via the judge’s decision (e.g. incarceration or release), not through any other channel. For instance, a harsh judge might send you to prison; a lenient judge might not – that difference can affect your future, but we assume that it’s only the incarceration that matters for your future outcome, not any direct effect of interacting with a harsh vs. nice judge per se. This exclusion restriction is more plausible when the decision-maker has no direct interaction with the individual beyond making the decision. Researchers take care to argue that conditional on the controls and the treatment itself, the identity of the examiner has no independent effect on outcomes. If these conditions (relevance and exogeneity/exclusion) hold, then the variation in treatment induced by examiner assignment can be used to consistently estimate the causal effect of the treatment.

By meeting these conditions, examiner/judge IV designs create a natural experiment. Essentially, they compare outcomes between individuals who, by random luck, received different treatment assignments (e.g. one was incarcerated, another not) due to differing examiner leniency, despite those individuals being comparable in expectation. This helps isolate the causal impact of the treatment from confounding factors. Notably, the estimates from such designs often correspond to a local average treatment effect (LATE) for those cases whose treatment status is swayed by the examiner’s leniency – for example, the “marginal” defendants who would be incarcerated by a strict judge but released by a lenient judge. In sum, these designs allow researchers to mimic a randomized experiment within observational data by leveraging institutional randomness (who gets assigned to whom) as an instrument.

34.8.2.2 Leniency IV: Clarifying the Terminology

The term leniency IV refers to this same instrumental variable strategy, emphasizing the role of the examiner’s leniency (or strictness). In many studies, the instrument is literally a measure of how lenient the assigned judge or examiner tends to be. For example, in a Social Security Disability study, researchers “exploit variation in examiners’ allowance rates as an instrument for benefit receipt.” (Maestas, Mullen, and Strand 2013). Here, an examiner’s allowance rate (the fraction of cases they approve) is a direct quantification of their leniency, and this serves as the instrumental variable. Similarly, one can define a judge’s leniency as the percentage of past defendants that judge jailed or the average sentence length they give, and use that as the IV. The phrase “leniency design” or leniency instrument simply underscores that it’s the lenient vs. strict tendencies of the decision-maker that provide the exogenous variation.

A leniency IV design typically involves constructing an instrument like “the leave-out mean decision rate of the assigned examiner.” This could be, for instance, the fraction of previous similar cases that the examiner approved (excluding the current case). That number captures how lenient or strict they generally are. Because assignment is random, some individuals get a high-leniency examiner and others a low-leniency examiner, creating exogenous variation in treatment. By comparing outcomes across these, one can identify the causal effect of the treatment. The term “leniency” highlights that it’s the discretionary toughness of the examiner that we’re leveraging.

34.8.2.3 Examples of Examiner/Judge IV Designs in Research

Many influential studies across economics and related fields have employed examiner or judge IV designs to answer causal questions. Below are several prominent examples illustrating the range of applications and findings:

  • Criminal Sentencing and Recidivism: In his seminal study, Kling (2006) examined the effect of incarceration length on ex-prisoners’ labor market outcomes. He used the random assignment of judges as an instrument, capitalizing on the fact that some judges are harsher (give longer sentences) and others more lenient. This judge IV strategy has since been used extensively to study how prison time impacts future criminal behavior and employment.

  • Pre-Trial Detention Decisions: The leniency design is also applied to bail and pre-trial release. Dobbie, Goldin, and Yang (2018) use the fact that arraignment judges vary in their tendency to set bail (versus release defendants) as an instrument to study the impact of pre-trial detention on defendants’ case outcomes and future behavior. Because defendants are quasi-randomly assigned to bail judges, this approach isolates how being jailed before trial causally affects outcomes like conviction or re-offense. These authors and others find, for example, that having a more lenient bail judge (who releases you pre-trial) leads to better long-run outcomes compared to a strict judge, indicating that pre-trial detention can have harmful causal effects.

  • Juvenile Incarceration and Life Outcomes: In a related vein, Aizer and Doyle Jr (2015) studied the effect of juvenile detention on high school completion and adult crime. They leveraged the random assignment of juvenile court judges, where some judges were more likely to incarcerate young offenders than others. This judge IV design revealed large negative causal impacts of juvenile incarceration on educational attainment and an increase in adult crime, evidence that sentencing leniency in youth can dramatically alter life trajectories (results consistent with the general pattern found in other judge IV studies of incarceration). This application illustrates how judicial decisions in youth have been treated as natural experiments.

  • Disability Insurance and Labor Supply: In the realm of social insurance, Maestas, Mullen, and Strand (2013) used an examiner design to determine whether receiving disability benefits discourages work. Each disability claim is assigned to a disability examiner, and some examiners approve benefits at higher rates than others. By using the quasi-random examiner assignment as an instrument, they found that for applicants on the margin of eligibility, receiving Disability Insurance caused a significant reduction in employment compared to if they had been denied. They report that about 23% of applicants are affected by which examiner they get, and those who were allowed benefits due to a lenient examiner would have had substantially higher employment rates had they instead been assigned a stricter examiner (and thus been denied). This study is a prime example of using medical or administrative examiner assignments to identify a policy’s effect.

  • Patent Grants and Innovation: Examiner designs are not limited to courts or social programs; they have been applied in innovation economics as well. Farre-Mensa, Hegde, and Ljungqvist (2020) analyze the value of obtaining a patent for startups by exploiting the U.S. Patent Office’s quasi-random assignment of applications to patent examiners. Some patent examiners are much more lenient (more likely to grant a patent) than others, effectively creating a “patent lottery”. The authors use examiner leniency as an instrument for whether a startup’s patent is approved. They find striking results: startups that “won” the lottery by drawing a lenient examiner had 55% higher employment growth and 80% higher sales five years later on average, compared to similar startups that ended up with a strict examiner and thus didn’t get the patent. This suggests that patent grants have a large causal impact on firm growth. This study showcases an examiner design in a regulatory/innovation setting – the term leniency IV in this case refers to the examiner’s propensity to allow patents.

  • Business Accelerators and Firm Growth: In an entrepreneurial finance context, González-Uribe and Reyes (2021) evaluate the impact of getting accepted into a business accelerator. Admission to the accelerator was determined by panels of judges scoring startup applicants, and the judges’ scoring leniency varied randomly across groups. The researchers exploit this by constructing an instrument based on the generosity of the judges’ scores for each applicant. They find that participating in the accelerator had a dramatic effect: startups that just made it in (thanks to generous-scoring judges) grew about 166% more in revenue than those that just missed the cutoff. This is an example of a “judge leniency” design outside of a courtroom – here the “judges” were competition evaluators, and their leniency in scoring provided the exogenous variation in program entry. It demonstrates that the examiner/judge IV approach can be applied to settings like business program evaluations or any scenario with selection committees.

These examples illustrate how examiner/judge (leniency) IV designs have been used in a wide array of empirical settings: from judicial decisions about bail, sentencing, and juvenile detention, to administrative adjudications on disability and bankruptcy, to regulatory approvals like patents, to evaluation panels in business or education contexts. In each case, the randomness of assignment and the differing “strictness” of the decision-makers create a natural experiment that researchers harness to estimate causal effects.

Why are these designs so valuable? They allow analysts to address the problem of unobserved heterogeneity or selection bias in observational data. Normally, people who receive a treatment (go to prison, get a benefit, win an award) may differ systematically from those who don’t, confounding simple comparisons. But if an outside examiner’s quasi-random decision determines who gets the treatment, we have a credible instrument to break that link. As one article notes, this approach has become quite popular as a reliable way to recover causal effects, even as many other attempted instruments face skepticism. The trade-off is that one must have a context where such random examiner assignment occurs and must carefully check the assumptions (e.g. truly random assignment, no direct effect of the examiner on outcomes aside from via treatment). When those conditions are met, examiner and judge IV designs provide compelling evidence on causal relationships that would be hard to identify otherwise.

34.8.2.4 Examples in Marketing

In marketing research, analogous setups can be constructed by identifying quasi-random sources of variation in decision-makers’ behaviors—such as sales representatives, regional managers, or customer service agents—who differ systematically in their tendency to approve discounts, upgrade customers, or resolve complaints favorably. These agents’ “leniency” can serve as an instrument for treatment assignment, enabling researchers to isolate causal effects in observational data where randomization is infeasible.

This analogical use of judge leniency introduces a powerful framework for addressing endogeneity in business contexts, allowing us to disentangle the effect of marketing actions (e.g., discounts, loyalty offers) from the confounding influence of customer selection or targeting bias.

Judge Analog Case Analog Instrument / Causal Variation Use Case / Research Question Potential Outcomes
Ad reviewer Submitted ad Reviewer identity, shift rotation Effect of ad rejection or delay on sales CTR, sales, acquisition
Search ranker Product view/visit Random tie-breaking in rank Impact of product ranking on behavior Purchases, engagement
Sales rep Customer inquiry Agent assignment variation Salesperson influence on conversion Conversion, satisfaction
CSR rep Complaint or service issue Shift schedule, escalation rules Does service response tone affect churn? Retention, NPS
Matching algorithm Influencer-brand pairing Batch assignment randomness Does match quality affect campaign ROI? ROI, awareness
Moderator User post / ad Moderator stringency variation Enforcement effect on trust and activity Engagement, advertiser trust
Grant reviewer Startup or proposal Panel assignment, reviewer fixed effects Causal effect of grant approval on growth Marketing scaling, performance

34.8.3 Proxy Variables

In applied business and economic analysis, we often confront a frustrating reality: the variables we truly care about—like brand loyalty, employee ability, or investor sentiment—are not directly observable. Instead, we rely on proxy variables, which are observable measures that stand in for these latent or omitted variables. Though useful, proxy variables must be used with care, as they introduce their own risks, most notably measurement error and incomplete control of endogeneity.

A proxy variable is an observed variable used in place of a variable that is either unobservable or omitted from a model. It is typically used under the assumption that it is correlated with the latent variable and explains some of its variation.

Let:

  • X be the latent (unobserved) variable,

  • X be the observed proxy,

  • Y be the outcome.

We may desire to estimate: Y=β0+β1X+ε, but since X is unavailable, we instead estimate:

Y=β0+β1X+u.

The effectiveness of this approach hinges on whether X can validly stand in for X.

34.8.3.1 Proxy Use and Omitted Variable Bias

Proxy variables are sometimes used as substitutes for omitted variables that cause endogeneity. Including a proxy can reduce endogeneity, but it will not generally eliminate bias, unless strict conditions are met.

Key Insight: Including a proxy does not allow us to estimate the effect of the omitted variable; rather, it helps mitigate the bias introduced by its omission.

To be more precise, let’s consider a classic omitted variable setup:

Suppose the true model is: Y=β0+β1X+β2Z+ε, but Z is omitted from the estimation. If Z is correlated with X, the OLS estimate of β1 will be biased.

Now, suppose we have a proxy Zp for Z. Including Zp in the regression: Y=β0+β1X+β2Zp+u can help reduce the bias if Zp meets the following criteria.


Let Z be the unobserved variable and Zp be the proxy. Then, Zp is a valid proxy if:

  1. Correlation: Zp is correlated with Z (i.e., Cov(Zp,Z)0).
  2. Residual Independence: The residual variation in Z unexplained by Zp is uncorrelated with all regressors (including Zp and X): Z=γ0+γ1Zp+ν,where Cov(ν,X)=Cov(ν,Zp)=0.
  3. No direct effect: Zp affects Y only through Z (or at least not directly).

Violation of these conditions can lead to biased or inconsistent estimates.


34.8.3.2 Example: IQ as a Proxy for Ability in Wage Regressions

In labor economics, researchers often study the effect of education on wages. But ability—an unobservable factor—also affects both education and wages, leading to omitted variable bias.

Let:

  • Y = wage,

  • X = education,

  • Z = ability (unobserved),

  • Z = IQ test score (proxy for ability).

Suppose the true model is: wage=β0+β1education+β2ability+ε.

Since ability is unobserved, we estimate: wage=β0+β1education+β2IQ+u, under the assumption: ability=γ0+γ1IQ+ν, with Cov(ν,education)=Cov(ν,IQ)=0.

This inclusion of IQ helps reduce endogeneity but does not identify the pure effect of ability unless all variation in ability is captured by IQ.


34.8.3.3 Pros and Cons of Proxy Variables

Advantages

  • Make latent variables measurable: Allows analysis of constructs that cannot be directly observed.
  • Practicality: Makes use of available data to address endogeneity.
  • Improved specification: Can reduce omitted variable bias if proxies are well chosen.

Disadvantages

  • Measurement error: Proxies usually include noise, causing attenuation bias (i.e., coefficients biased toward zero).

    If X=X+ν, with ν classical measurement error (zero mean, uncorrelated with X and ε), then: plim(ˆβ1)=λβ1,where λ=σ2Xσ2X+σ2ν<1.

  • Interpretation issues: Coefficients on proxies conflate the causal effect with proxy quality.

  • Insufficient control: Proxies only partially reduce omitted variable bias unless they meet strict independence conditions.

34.8.3.4 Empirical Illustration: Simulating Attenuation Bias

set.seed(2025)
n <- 1000
ability <- rnorm(n)                   # latent variable
IQ <- ability + rnorm(n, sd = 0.5)    # proxy variable
education <- 12 + 0.5 * ability + rnorm(n)  # correlated regressor
wage <- 20 + 1.5 * education + 2 * ability + rnorm(n)  # true model

# Model using education only (omitted variable bias)
mod1 <- lm(wage ~ education)

# Model using education and proxy
mod2 <- lm(wage ~ education + IQ)

summary(mod1)
#> 
#> Call:
#> lm(formula = wage ~ education)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -7.4949 -1.3590 -0.0082  1.3766  6.6601 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 10.51325    0.71353   14.73   <2e-16 ***
#> education    2.28903    0.05918   38.68   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.061 on 998 degrees of freedom
#> Multiple R-squared:  0.5999, Adjusted R-squared:  0.5995 
#> F-statistic:  1496 on 1 and 998 DF,  p-value: < 2.2e-16
summary(mod2)
#> 
#> Call:
#> lm(formula = wage ~ education + IQ)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.3224 -0.9052  0.0523  0.9370  4.5822 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 17.96426    0.49599   36.22   <2e-16 ***
#> education    1.67098    0.04114   40.62   <2e-16 ***
#> IQ           1.55953    0.04096   38.07   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.317 on 997 degrees of freedom
#> Multiple R-squared:  0.8369, Adjusted R-squared:  0.8366 
#> F-statistic:  2558 on 2 and 997 DF,  p-value: < 2.2e-16

Observe how including the proxy reduces the bias in the coefficient on education, even if it doesn’t eliminate it entirely.

34.8.3.5 Example: Marketing — Brand Loyalty

Suppose you’re modeling the effect of brand loyalty (X) on repeat purchase (Y). Since loyalty is latent, we might use:

  • Number of prior purchases,
  • Duration of current brand use,
  • Membership in loyalty programs.

These proxies are likely to be correlated with true loyalty, but none is a perfect substitute.

# Simulating attenuation bias with a proxy
set.seed(42)
n <- 1000
X_star <- rnorm(n)  # true unobserved brand loyalty
proxy <- X_star + rnorm(n, sd = 0.6)  # proxy with measurement error
error <- rnorm(n)
Y <- 3 + 2 * X_star + error  # true model

# Model using the proxy variable
model_proxy <- lm(Y ~ proxy)
summary(model_proxy)
#> 
#> Call:
#> lm(formula = Y ~ proxy)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.3060 -1.0130 -0.0018  0.9131  4.5493 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  2.98737    0.04584   65.17   <2e-16 ***
#> proxy        1.45513    0.03921   37.11   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.449 on 998 degrees of freedom
#> Multiple R-squared:  0.5798, Adjusted R-squared:  0.5794 
#> F-statistic:  1377 on 1 and 998 DF,  p-value: < 2.2e-16

Observe that the estimated coefficient on proxy is less than the true coefficient (2), due to measurement error.


34.8.3.6 Example: Finance — Investor Sentiment

Investor sentiment affects market movements but cannot be directly measured. Proxies include:

  • Put-call ratios

  • Bullish/bearish sentiment surveys,

  • Volume of IPO activity,

  • Retail investor trading flows.

These capture different dimensions of sentiment, and their effectiveness varies by context.


34.8.3.7 Strategies to Improve Proxy Use

  • Multiple proxies: Use several proxies and combine them via factor analysis or PCA

  • Instrumental variables: If a valid instrument exists for the proxy, use two-stage least squares to correct for measurement error.

  • Latent variable models: Structural Equation Modeling (SEM) allows estimation of models with latent variables explicitly.

Proxy variables are valuable tools in empirical research when used with caution. They offer a bridge between theory and data when important variables are unobservable. However, this bridge is built on assumptions—especially regarding correlation, measurement error, and residual independence—that must be carefully justified.

Key Takeaway: A proxy can reduce bias from omitted variables but introduces its own risks—especially measurement error and interpretive ambiguity. The best practice is to use proxies transparently, test assumptions when possible, and consider alternative solutions such as instruments or structural models.

References

Aizer, Anna, and Joseph J Doyle Jr. 2015. “Juvenile Incarceration, Human Capital, and Future Crime: Evidence from Randomly Assigned Judges.” The Quarterly Journal of Economics 130 (2): 759–803.
Angrist, Joshua D, and Guido W Imbens. 1995. “Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity.” Journal of the American Statistical Association 90 (430): 431–42.
Angrist, Joshua D, and Victor Lavy. 1999. “Using Maimonides’ Rule to Estimate the Effect of Class Size on Scholastic Achievement.” The Quarterly Journal of Economics 114 (2): 533–75.
Angrist, Joshua, Victor Lavy, and Analia Schlosser. 2010. “Multiple Experiments for the Causal Link Between the Quantity and Quality of Children.” Journal of Labor Economics 28 (4): 773–824.
Dobbie, Will, Jacob Goldin, and Crystal S Yang. 2018. “The Effects of Pre-Trial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges.” American Economic Review 108 (2): 201–40.
Farre-Mensa, Joan, Deepak Hegde, and Alexander Ljungqvist. 2020. “What Is a Patent Worth? Evidence from the US Patent ‘Lottery’.” The Journal of Finance 75 (2): 639–82.
González-Uribe, Juanita, and Santiago Reyes. 2021. “Identifying and Boosting ‘Gazelles’: Evidence from Business Accelerators.” Journal of Financial Economics 139 (1): 260–87.
Kling, Jeffrey R. 2006. “Incarceration Length, Employment, and Earnings.” American Economic Review 96 (3): 863–76.
Maestas, Nicole, Kathleen J Mullen, and Alexander Strand. 2013. “Does Disability Insurance Receipt Discourage Work? Using Examiner Assignment to Estimate Causal Effects of SSDI Receipt.” American Economic Review 103 (5): 1797–1829.
Permutt, Thomas, and J Richard Hebel. 1989. “Simultaneous-Equation Estimation in a Clinical Trial of the Effect of Smoking on Birth Weight.” Biometrics, 619–22.
Powers, Donald E, and Spencer S Swinton. 1984. “Effects of Self-Study for Coachable Test Item Types.” Journal of Educational Psychology 76 (2): 266.