10 Model Specification
Test whether underlying assumptions hold true
 Nested Model (A1/A3)
 NonNested Model (A1/A3)
 Heteroskedasticity (A4)
10.1 Nested Model
\[ \begin{aligned} y &= \beta_0 + x_1\beta_1 + x_2\beta2 + x_3\beta_3 + \epsilon & \text{unrestricted model} \\ y &= \beta_0 + x_1\beta_1 + \epsilon & \text{restricted model} \end{aligned} \]
Unrestricted model is always longer than the restricted model
The restricted model is “nested” within the unrestricted model
To determine which variables should be included or exclude, we could use the same Wald Test
Adjusted \(R^2\)
 \(R^2\) will always increase with more variables included
 Adjusted \(R^2\) tries to correct by penalizing inclusion of unnecessary variables.
\[ \begin{aligned} {R}^2 &= 1  \frac{SSR/n}{SST/n} \\ {R}^2_{adj} &= 1  \frac{SSR/(nk)}{SST/(n1)} \\ &= 1  \frac{(n1)(1R^2)}{(nk)} \end{aligned} \]
 \({R}^2_{adj}\) increases if and only if the tstatistic on the additional variable is greater than 1 in absolute value.
 \({R}^2_{adj}\) is valid in models where there is no heteroskedasticity
 there fore it should not be used in determining which variables should be included in the model (the t or Ftests are more appropriate)
10.2 NonNested Model
compare models with different nonnested specifications
10.2.1 DavidsonMackinnon test
10.2.1.1 Independent Variable
Should the independent variables be logged? (decide between nonnested alternatives)
\[ \begin{aligned} y = \beta_0 + x_1\beta_1 + x_2\beta_2 + \epsilon && \text{(level eq)} \\ y = \beta_0 + ln(x_1)\beta_1 + x_2\beta_2 + \epsilon && \text{(log eq)} \end{aligned} \]
 Obtain predict outcome when estimating the model in log equation \(\check{y}\) and then estimate the following auxiliary equation,
\[ y = \beta_0 + x_1\beta_1 + x_2\beta_2 + \check{y}\gamma + error \]
and evaluate the tstatistic for the null hypothesis \(H_0: \gamma = 0\)
 Obtain predict outcome when estimating the model in the level equation \(\hat{y}\), then estimate the following auxiliary equation,
\[ y = \beta_0 + ln(x_1)\beta_1 + x_2\beta_2 + \check{y}\gamma + error \]
and evaluate the tstatistic for the null hypothesis \(H_0: \gamma = 0\)
 If you reject the null in the (1) step but fail to reject the null in the second step, then the log equation is preferred.
 If fail to reject the null in the (1) step but reject the null in the (2) step then, level equation is preferred.
 If reject in both steps, then you have statistical evidence that neither model should be used and should reevaluate the functional form of your model.
 If fail to reject in both steps, you do not have sufficient evidence to prefer one model over the other. You can compare the \(R^2_{adj}\) to choose between the two models.
\[ \begin{aligned} y &= \beta_0 + ln(x)\beta_1 + \epsilon \\ y &= \beta_0 + x(\beta_1) + x^2\beta_2 + \epsilon \end{aligned} \]
 Compare which better fits the data
 Compare standard \(R^2\) is unfair because the second model is less parsimonious (more parameters to estimate)
 The \(R_{adj}^2\) will penalize the second model for being less parsimonious + Only valid when there is no heteroskedasticity (A4 holds)
 Should only compare after a DavidsonMackinnon test
10.2.1.2 Dependent Variable
\[ \begin{aligned} y &= \beta_0 + x_1\beta_1 + \epsilon & \text{level eq} \\ ln(y) &= \beta_0 + x_1\beta_1 + \epsilon & \text{log eq} \\ \end{aligned} \]
 In the level model, regardless of how big y is, x has a constant effect (i.e., one unit change in \(x_1\) results in a \(\beta_1\) unit change in y)
 In the log model, the larger in y is, the effect of x is stronger (i.e., one unit change in \(x_1\) could increase y from 1 to \(1+\beta_1\) or from 100 to 100+100x\(\beta_1\))
 Cannot compare \(R^2\) or \(R^2_{adj}\) because the outcomes are complement different, the scaling is different (SST is different)
We need to “untransform” the \(ln(y)\) back to the same scale as y and then compare,
 Estimate the model in the log equation to obtain the predicted outcome \(\hat{ln(y)}\)
 “Untransform” the predicted outcome
\[ \hat{m} = exp(\hat{ln(y)}) \]
 Estimate the following model (without an intercept)
\[ y = \alpha\hat{m} + error \]
and obtain predicted outcome \(\hat{y}\)
 Then take the square of the correlation between \(\hat{y}\) and y as a scaled version of the \(R^2\) from the log model that can now compare with the usual \(R^2\) in the level model.
10.3 Heteroskedasticity
Using roust standard errors are always valid

If there is significant evidence of heteroskedasticity implying A4 does not hold
 GaussMarkov Theorem no longer holds, OLS is not BLUE.
 Should consider using a better linear unbiased estimator (Weighted Least Squares or Generalized Least Squares)
10.3.1 BreuschPagan test
A4 implies
\[ E(\epsilon_i^2\mathbf{x_i})=\sigma^2 \]
\[ \epsilon_i^2 = \gamma_0 + x_{i1}\gamma_1 + ... + x_{ik 1}\gamma_{k1} + error \]
and determining whether or not \(\mathbf{x}_i\) has any predictive value
 if \(\mathbf{x}_i\) has predictive value, then the variance changes over the levels of \(\mathbf{x}_i\) which is evidence of heteroskedasticity
 if \(\mathbf{x}_i\) does not have predictive value, the variance is constant for all levels of \(\mathbf{x}_i\)
The BreuschPagan test for heteroskedasticity would compute the Ftest of total significance for the following model
\[ e_i^2 = \gamma_0 + x_{i1}\gamma_1 + ... + x_{ik 1}\gamma_{k1} + error \]
A low pvalue means we reject the null of homoskedasticity
However, BreuschPagan test cannot detect heteroskedasticity in nonlinear form
10.3.2 White test
test heteroskedasticity would allow for a nonlinear relationship by computing the Ftest of total significance for the following model (assume there are three independent random variables)
\[ \begin{aligned} e_i^2 &= \gamma_0 + x_i \gamma_1 + x_{i2}\gamma_2 + x_{i3}\gamma_3 \\ &+ x_{i1}^2\gamma_4 + x_{i2}^2\gamma_5 + x_{i3}^2\gamma_6 \\ &+ (x_{i1} \times x_{i2})\gamma_7 + (x_{i1} \times x_{i3})\gamma_8 + (x_{i2} \times x_{i3})\gamma_9 + error \end{aligned} \]
A low pvalue means we reject the null of homoskedasticity
Equivalently, we can compute LM as \(LM = nR^2_{e^2}\) where the \(R^2_{e^2}\) come from the regression with the squared residual as the outcome
 The LM statistic has a \(\chi_k^2\) distribution