10 Model Specification

Test whether underlying assumptions hold true

10.1 Nested Model

\[ \begin{aligned} y &= \beta_0 + x_1\beta_1 + x_2\beta-2 + x_3\beta_3 + \epsilon & \text{unrestricted model} \\ y &= \beta_0 + x_1\beta_1 + \epsilon & \text{restricted model} \end{aligned} \]

Unrestricted model is always longer than the restricted model
The restricted model is “nested” within the unrestricted model
To determine which variables should be included or exclude, we could use the same Wald Test

Adjusted \(R^2\)

\(R^2\) will always increase with more variables included
Adjusted \(R^2\) tries to correct by penalizing inclusion of unnecessary variables.

\[ \begin{aligned} {R}^2 &= 1 - \frac{SSR/n}{SST/n} \\ {R}^2_{adj} &= 1 - \frac{SSR/(n-k)}{SST/(n-1)} \\ &= 1 - \frac{(n-1)(1-R^2)}{(n-k)} \end{aligned} \]

\({R}^2_{adj}\) increases if and only if the t-statistic on the additional variable is greater than 1 in absolute value.
\({R}^2_{adj}\) is valid in models where there is no heteroskedasticity
there fore it should not be used in determining which variables should be included in the model (the t or F-tests are more appropriate)

10.1.1 Chow test

Should we run two different regressions for two groups?

10.2 Non-Nested Model

compare models with different non-nested specifications

10.2.1 Davidson-Mackinnon test

10.2.1.1 Independent Variable

Should the independent variables be logged? (decide between non-nested alternatives)

\[ \begin{aligned} y = \beta_0 + x_1\beta_1 + x_2\beta_2 + \epsilon && \text{(level eq)} \\ y = \beta_0 + ln(x_1)\beta_1 + x_2\beta_2 + \epsilon && \text{(log eq)} \end{aligned} \]

Obtain predict outcome when estimating the model in log equation \(\check{y}\) and then estimate the following auxiliary equation,

\[ y = \beta_0 + x_1\beta_1 + x_2\beta_2 + \check{y}\gamma + error \]

and evaluate the t-statistic for the null hypothesis \(H_0: \gamma = 0\)

Obtain predict outcome when estimating the model in the level equation \(\hat{y}\), then estimate the following auxiliary equation,

\[ y = \beta_0 + ln(x_1)\beta_1 + x_2\beta_2 + \check{y}\gamma + error \]

and evaluate the t-statistic for the null hypothesis \(H_0: \gamma = 0\)

If you reject the null in the (1) step but fail to reject the null in the second step, then the log equation is preferred.
If fail to reject the null in the (1) step but reject the null in the (2) step then, level equation is preferred.
If reject in both steps, then you have statistical evidence that neither model should be used and should re-evaluate the functional form of your model.
If fail to reject in both steps, you do not have sufficient evidence to prefer one model over the other. You can compare the \(R^2_{adj}\) to choose between the two models.

\[ \begin{aligned} y &= \beta_0 + ln(x)\beta_1 + \epsilon \\ y &= \beta_0 + x(\beta_1) + x^2\beta_2 + \epsilon \end{aligned} \]

Compare which better fits the data
Compare standard \(R^2\) is unfair because the second model is less parsimonious (more parameters to estimate)
The \(R_{adj}^2\) will penalize the second model for being less parsimonious + Only valid when there is no heteroskedasticity (A4 holds)
Should only compare after a Davidson-Mackinnon test

10.2.1.2 Dependent Variable

\[ \begin{aligned} y &= \beta_0 + x_1\beta_1 + \epsilon & \text{level eq} \\ ln(y) &= \beta_0 + x_1\beta_1 + \epsilon & \text{log eq} \\ \end{aligned} \]

In the level model, regardless of how big y is, x has a constant effect (i.e., one unit change in \(x_1\) results in a \(\beta_1\) unit change in y)
In the log model, the larger in y is, the effect of x is stronger (i.e., one unit change in \(x_1\) could increase y from 1 to \(1+\beta_1\) or from 100 to 100+100x\(\beta_1\))
Cannot compare \(R^2\) or \(R^2_{adj}\) because the outcomes are complement different, the scaling is different (SST is different)

We need to “un-transform” the \(ln(y)\) back to the same scale as y and then compare,

Estimate the model in the log equation to obtain the predicted outcome \(\hat{ln(y)}\)
“Un-transform” the predicted outcome

\[ \hat{m} = exp(\hat{ln(y)}) \]

Estimate the following model (without an intercept)

\[ y = \alpha\hat{m} + error \]

and obtain predicted outcome \(\hat{y}\)

Then take the square of the correlation between \(\hat{y}\) and y as a scaled version of the \(R^2\) from the log model that can now compare with the usual \(R^2\) in the level model.

10.3 Heteroskedasticity

Using roust standard errors are always valid
If there is significant evidence of heteroskedasticity implying A4 does not hold
- Gauss-Markov Theorem no longer holds, OLS is not BLUE.
- Should consider using a better linear unbiased estimator (Weighted Least Squares or Generalized Least Squares)

10.3.1 Breusch-Pagan test

A4 implies

\[ E(\epsilon_i^2|\mathbf{x_i})=\sigma^2 \]

\[ \epsilon_i^2 = \gamma_0 + x_{i1}\gamma_1 + ... + x_{ik -1}\gamma_{k-1} + error \]

and determining whether or not \(\mathbf{x}_i\) has any predictive value

if \(\mathbf{x}_i\) has predictive value, then the variance changes over the levels of \(\mathbf{x}_i\) which is evidence of heteroskedasticity
if \(\mathbf{x}_i\) does not have predictive value, the variance is constant for all levels of \(\mathbf{x}_i\)

The Breusch-Pagan test for heteroskedasticity would compute the F-test of total significance for the following model

\[ e_i^2 = \gamma_0 + x_{i1}\gamma_1 + ... + x_{ik -1}\gamma_{k-1} + error \]

A low p-value means we reject the null of homoskedasticity

However, Breusch-Pagan test cannot detect heteroskedasticity in non-linear form

10.3.2 White test

test heteroskedasticity would allow for a non-linear relationship by computing the F-test of total significance for the following model (assume there are three independent random variables)

\[ \begin{aligned} e_i^2 &= \gamma_0 + x_i \gamma_1 + x_{i2}\gamma_2 + x_{i3}\gamma_3 \\ &+ x_{i1}^2\gamma_4 + x_{i2}^2\gamma_5 + x_{i3}^2\gamma_6 \\ &+ (x_{i1} \times x_{i2})\gamma_7 + (x_{i1} \times x_{i3})\gamma_8 + (x_{i2} \times x_{i3})\gamma_9 + error \end{aligned} \]

A low p-value means we reject the null of homoskedasticity

Equivalently, we can compute LM as \(LM = nR^2_{e^2}\) where the \(R^2_{e^2}\) come from the regression with the squared residual as the outcome

The LM statistic has a [\(\chi_k^2\)][Chi-squared] distribution

9 Nonlinear and Generalized Linear Mixed Models

11 Imputation (Missing Data)