Chapter 4 Causality and Bias
In the last chapter, we worked with the following model,
\[\begin{gather} Wage_i = \alpha_0 + \alpha_1 Schooling_i + e_i. \tag{4.1} \end{gather}\]
Implicitly, the following things are true, too.
\[\begin{gather} E[Wage|Schooling] = \alpha_0 + \alpha_1 Schooling \\ E[e|Schooling] = 0 \end{gather}\]
Now, we’re going to write down a causal model. This model will also be linear and look very similar to the model in (4.1). However, since the causal effect of schooling may differ from the associated effect of schooling, we’ll use different Greek letters for the coefficients and a different letter for the error term.
\[\begin{gather} Wage_i = \beta_0 + \beta_1 Schooling_i + u_i. \tag{4.2} \end{gather}\]
If \(E[u|Schooling]=0\), then (4.1) and (4.2) are equivalent and \(\alpha_1\) equals \(\beta_1\). In this case, the OLS estimate of (4.2) provides us with an unbiased estimate of the causal effect of schooling on wages.
However, in the causal model, \(E[u|Schooling]\) may not equal zero, but might instead depend on the value of \(Schooling\). In this case, the OLS estimate of \(\beta_1\) in (4.2) is biased.
Another way of thinking about this is the following. We estimate (4.1) with OLS. We obtain unbiased estimates of the associated effect of schooling on wages. If the zero conditional mean assumption holds true in the causal model, then this estimated effect is also an unbiased estimated of the causal effect of schooling on wages.
The zero conditional mean (ZCM) assumption is that \(E[u|Schooling]=0\).
4.1 Example
In words, the ZCM asks, what is the average value of \(u_i\) if we know the value of schooling. To address this question, we need to think about whether other determinants of wages might be correlated with schooling. Remember, \(u_i\) captures the contribution to wages of all other variables that influence wages.
Consider the income of the parents of the workers in our sample. We’ll call refer to this variable as parental income. There are two questions we need to ask,
- Is parental income a determinant of wages?
- Is parental income correlated with years of schooling?
Answering these questions requires some contextual knowledge about wages and schooling in the population (among US workers).
Let’s suppose that we believe that parental income is a determinant of earnings, and the relationship is positive: higher parental income causes higher own wages. Suppose we also believe that parental income is positively correlated with years of schooling. Then, if we know that a person has a lot of schooling, then we know it is more likely (than otherwise) that their parents’ income was also high. And if their parents’ income was high, then we know that their own wages will be higher (other things equal).
In (4.2), the only way parental income can affect wages directly is through the error term, \(u\). Thus, we would infer that a person with a higher schooling level is also likely to have \(u>0\): high schooling \(\rightarrow\) high parental income \(\rightarrow\) higher error term \(\rightarrow\) higher wages.
Now, how does threaten a causal interpretation of the estimate of \(\beta_1\) in (4.2)? The problem is, we observe that workers with more schooling have higher wages, but we don’t know if it is because they have more schooling or if it is because their parents had higher incomes. More specifically, we don’t know how much of the estimated effect of schooling on wages is due to parental income.
At this point, it is useful to introduce two terms.
- Omitted Variable. An omitted variable is a determinant of \(Y\) that is not included as a regressor in the model, and is therefore subsumed in the error term.
- Omitted Variable Bias. Omitted variable bias arises when an omitted variable is correlated with an included regressor.
Thus, when ZCM fails, we consider the OLS estimate of \(\beta_1\) to be biased. However, we can always treat is as an estimate of the associated effect of \(X\) on \(Y\) (i.e. an estimate of \(\alpha_1\).)
4.2 Direction of Bias
Sometimes it is helpful to think about whether a particular omitted variable generates upward or downward bias. Consider the conditional expectation of (4.2):
\[ E[Wage|Schooling] = \beta_0 + \beta_1 Schooling + E[u|Schooling] \] We are interested in the marginal effect of schooling:
\[\begin{gather} \frac{\Delta E[Wage|Schooling]}{\Delta Schooling} = \beta_1 + \frac{\Delta E[u|Schooling]}{\Delta Schooling} \tag{4.3} \end{gather}\]
The causal parameter, or causal effect of schooling on wages, is \(\beta_1\). However, OLS yields an estimate of the entire right hand side of (4.3). In other words, the right hand side of (4.3) equals \(\alpha_1\) in the associative model.
The term, \(\frac{\Delta E[u|Schooling]}{\Delta Schooling}\) is the bias term. It is zero if changes in \(Schooling\) are not associated with changes in the average value of the error term. In our example above, we suggested that workers with more schooling likely had higher parents’ income, and higher parents’ income tends to increase the error term. So in this case, the bias term is plausibly positive. If we are right about the relationship between these three variables, then OLS overestimates \(\beta_1\).
4.3 Notation
From now on, we will write down population models using \(\beta\) coefficients. However, we will often interpret estimates as capturing associated effects. We do this to err on the side of caution, so to speak. We will learn techniques that help us address omitted variable bias, but in many cases it will be difficult to know whether there remains additional omitted variables that are correlated with the regressor in the model.