9.5 Exercises
1. Simulation Study: Misspecification of Functional Form
As stated in Chapter 9.2, misspecification of the regression function violates assumption 1 of Key Concept 6.3 so that the OLS estimator will be biased and inconsistent. We have illustrated the bias of \(\hat{\beta}_0\) for the example of the quadratic population regression function \[Y_i = X_i^2 \] and the linear model \[Y_i = \beta_0 + \beta_1 X_i + u_i, \, u_i \sim \mathcal{N}(0,1)\] using 100 randomly generated observations. Strictly speaking, this finding could be just a coincidence because we consider just one estimate obtained using a single data set.
In this exercise, you have to generate simulation evidence for the bias of \(\hat{\beta}_0\) in the model \[Y_i = \beta_0 + \beta_1 X_i + u_i\] if the population regression function is \[Y_i = X_i^2.\]
Instructions:
Make sure to use the definitions suggested in the skeleton code in script.R to complete the following tasks:
Generate 1000 OLS estimates of \(\beta_0\) in the model above using a for() loop where \(X_i \sim \mathcal{U}[-5,5]\), \(u_i \sim \mathcal{N}(0,1)\) using samples of size \(100\). Save the estimates in beta_hats.
Compare the sample mean of the estimates to the true parameter using the == operator.
Hint:
You can generate random numbers from a uniform distribution using runif().
2. Simulation Study: Errors-in-Variables Bias
Consider again the application of the classical measurement error model introduced in Chapter 9.2:
The single regressor \(X_i\) is measured with error so that \(\overset{\sim}{X}_i\) is observed instead. Thus one estimates \(\beta_1\) in \[\begin{align*} Y_i =& \, \beta_0 + \beta_1 \overset{\sim}{X}_i + \underbrace{\beta_1 (X_i -\overset{\sim}{X}_i) + u_i}_{=v_i} \\ Y_i =& \, \beta_0 + \beta_1 \overset{\sim}{X}_i + v_i \end{align*}\]instead of \[Y_i = \beta_0 + \beta_1 X_i + u_i,\]
with the zero mean error \(w_i\) being uncorrelated with \(X_i\) and \(u_i\). Then \(\beta_1\) is inconsistently estimated by OLS: \[\begin{equation} \widehat{\beta}_1 \xrightarrow{p}{\frac{\sigma_{X}^2}{\sigma_{X}^2 + \sigma_{w}^2}} \beta_1 \end{equation}\]Let \[(X, Y) \sim \mathcal{N}\left[\begin{pmatrix}50\\ 100\end{pmatrix},\begin{pmatrix}10 & 5 \\ 5 & 10 \end{pmatrix}\right].\] Recall from (9.2) that \(E(Y_i\vert X_i) = 75 + 0.5 X_i\) in this case. Further Assume that \(\overset{\sim}{X_i} = X_i + w_i\) with \(w_i \overset{i.i.d}{\sim} \mathcal{N}(0,10)\).
As mentioned in Exercise 1, Chapter 9.2 discusses the consequences of the measurement error for the OLS estimator of \(\beta_1\) in this setting based on a single sample and and thus just one estimate. Strictly speaking, the conclusion made could be wrong because the oberseved bias may be due to random variation. A Monto Carlo simulation is more appropriate here.
Instructions:
Show that \(\beta_1\) is estimated with a bias using a simulation study. Make sure to use the definitions suggested in the skeleton code in script.R to complete the following tasks:
Generate 1000 estimates of \(\beta_1\) in the simple regression model \[Y_i = \beta_0 + \beta_1 X_i + u_i.\] Use rmvnorm() to generate samples of 100 random observations from the bivariate normal distribution stated above.
Save the estimates in beta_hats.
Compute the sample mean of the estimates.