14.2 Non-Nested Model Tests

Comparing models is essential to identify which specification best explains the data. While nested model comparisons rely on one model being a restricted version of another, non-nested models do not share such a hierarchical structure. This situation commonly arises when comparing models with:

  • Different functional forms (e.g., linear vs. logarithmic relationships),
  • Different sets of explanatory variables,
  • Competing theoretical frameworks.

To compare non-nested models, we rely on specialized statistical tests designed to handle these complexities. The two most widely used approaches are:

  • Vuong Test used to compare the fit of two non-nested models to determine which model better explains the data.

  • Davidson–MacKinnon J-Test is a regression-based approach for comparing non-nested models. It evaluates which model better fits the data by incorporating the predicted values from the competing model as an additional regressor.


Consider two competing models:

  • Model A:
    \[ y = \alpha_0 + \alpha_1 f(X) + \epsilon_A \]
  • Model B:
    \[ y = \beta_0 + \beta_1 g(Z) + \epsilon_B \]

Where:

  • \(f(X)\) and \(g(Z)\) represent different functional forms or different sets of explanatory variables.

  • The models are non-nested because one cannot be obtained from the other by restricting parameters.

Our goal is to determine which model better explains the data.


14.2.1 Vuong Test

The Vuong Test is a likelihood-ratio-based approach for comparing non-nested models (Vuong 1989). It is particularly useful when both models are estimated via Maximum Likelihood Estimation.

Hypotheses

  • Null Hypothesis (\(H_0\)): Both models are equally close to the true data-generating process (i.e., the models have equal predictive power).
  • Alternative Hypothesis (\(H_1\)):
    • Positive Test Statistic (\(V > 0\)): Model A is preferred.
    • Negative Test Statistic (\(V < 0\)): Model B is preferred.

Vuong Test Statistic

The Vuong test is based on the difference in the log-likelihood contributions of each observation under the two models. Let:

  • \(\ell_{A,i}\) = log-likelihood of observation \(i\) under Model A,
  • \(\ell_{B,i}\) = log-likelihood of observation \(i\) under Model B.

Define the difference in log-likelihoods:

\[ m_i = \ell_{A,i} - \ell_{B,i} \]

The Vuong test statistic is:

\[ V = \frac{\sqrt{n} \, \bar{m}}{s_m} \]

Where:

  • \(\bar{m} = \frac{1}{n} \sum_{i=1}^n m_i\) is the sample mean of the log-likelihood differences,

  • \(s_m = \sqrt{\frac{1}{n} \sum_{i=1}^n (m_i - \bar{m})^2}\) is the sample standard deviation of \(m_i\),

  • \(n\) is the sample size.


Distribution and Decision Rule

  • Under \(H_0\), the Vuong statistic asymptotically follows a standard normal distribution:

\[ V \sim N(0, 1) \]

  • Decision Rule:
    • If \(|V| > z_{\alpha/2}\) (critical value from the standard normal distribution), reject \(H_0\).
      • If \(V > 0\): Prefer Model A.
      • If \(V < 0\): Prefer Model B.
    • If \(|V| \leq z_{\alpha/2}\): Fail to reject \(H_0\); no significant difference between models.

Corrections for Model Complexity

When comparing models with different numbers of parameters, a penalized version of the Vuong test can be used, similar to adjusting for model complexity in criteria like AIC or BIC. The corrected statistic is:

\[ V_{\text{adjusted}} = V - \frac{(k_A - k_B) \ln(n)}{2 s_m \sqrt{n}} \]

Where \(k_A\) and \(k_B\) are the number of parameters in Models A and B, respectively.


Limitations of the Vuong Test

  • Requires models to be estimated via Maximum Likelihood.
  • Sensitive to model misspecification and heteroskedasticity.
  • Assumes independent and identically distributed (i.i.d.) errors.

14.2.2 Davidson–MacKinnon J-Test

The Davidson–MacKinnon J-Test provides a flexible, regression-based approach for comparing non-nested models (Davidson and MacKinnon 1981). It evaluates whether the predictions from one model contain information not captured by the competing model. This test can be thought of as comparing models with transformed independent variables, as opposed to the next section, Comparing Models with Transformed Dependent Variables.

Hypotheses

  • Null Hypothesis (\(H_0\)): The competing model does not provide additional explanatory power beyond the current model.
  • Alternative Hypothesis (\(H_1\)): The competing model provides additional explanatory power.

Procedure

Consider two competing models:

  • Model A:
    \[ y = \alpha_0 + \alpha_1 x + \epsilon_A \]
  • Model B:
    \[ y = \beta_0 + \beta_1 \ln(x) + \epsilon_B \]

Step 1: Testing Model A Against Model B

  1. Estimate Model B and obtain predicted values \(\hat{y}_B\).
  2. Run the auxiliary regression:

\[ y = \alpha_0 + \alpha_1 x + \gamma \hat{y}_B + u \]

  1. Test the null hypothesis:

\[ H_0: \gamma = 0 \]

  • If \(\gamma\) is significant, Model B adds explanatory power beyond Model A.
  • If \(\gamma\) is not significant, Model A sufficiently explains the data.

Step 2: Testing Model B Against Model A

  1. Estimate Model A and obtain predicted values \(\hat{y}_A\).
  2. Run the auxiliary regression:

\[ y = \beta_0 + \beta_1 \ln(x) + \gamma \hat{y}_A + u \]

  1. Test the null hypothesis:

\[ H_0: \gamma = 0 \]


Decision Rules

  • Reject \(H_0\) in Step 1, Fail to Reject in Step 2: Prefer Model B.
  • Fail to Reject \(H_0\) in Step 1, Reject in Step 2: Prefer Model A.
  • Reject \(H_0\) in Both Steps: Neither model is adequate; reconsider the functional form.
  • Fail to Reject \(H_0\) in Both Steps: No strong evidence to prefer one model; rely on other criteria (e.g., theory, simplicity). Alternatively, \(R^2_{adj}\) can also be used to choose between the two.

Adjusted \(R^2\)

  • \(R^2\) will always increase with more variables included
  • Adjusted \(R^2\) tries to correct by penalizing inclusion of unnecessary variables.

\[ \begin{aligned} {R}^2 &= 1 - \frac{SSR/n}{SST/n} \\ {R}^2_{adj} &= 1 - \frac{SSR/(n-k)}{SST/(n-1)} \\ &= 1 - \frac{(n-1)(1-R^2)}{(n-k)} \end{aligned} \]

  • \({R}^2_{adj}\) increases if and only if the t-statistic on the additional variable is greater than 1 in absolute value.
  • \({R}^2_{adj}\) is valid in models where there is no heteroskedasticity
  • there fore it should not be used in determining which variables should be included in the model (the t or F-tests are more appropriate)

14.2.3 Adjusted \(R^2\)

The coefficient of determination (\(R^2\)) measures the proportion of the variance in the dependent variable that is explained by the model. However, a key limitation of \(R^2\) is that it always increases (or at least stays the same) when additional explanatory variables are added to the model, even if those variables are not statistically significant.

To address this issue, the adjusted \(R^2\) introduces a penalty for including unnecessary variables, making it a more reliable measure when comparing models with different numbers of predictors.


Formulas

Unadjusted \(R^2\):

\[ R^2 = 1 - \frac{SSR}{SST} \]

Where:

  • \(SSR\) = Sum of Squared Residuals (measures unexplained variance),

  • \(SST\) = Total Sum of Squares (measures total variance in the dependent variable).

Adjusted \(R^2\):

\[ R^2_{\text{adj}} = 1 - \frac{SSR / (n - k)}{SST / (n - 1)} \]

Alternatively, it can be expressed in terms of \(R^2\) as:

\[ R^2_{\text{adj}} = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k} \right) \]

Where:

  • \(n\) = Number of observations,

  • \(k\) = Number of estimated parameters in the model (including the intercept).


Key Insights

  • Penalty for Complexity: Unlike \(R^2\), the adjusted \(R^2\) can decrease when irrelevant variables are added to the model because it adjusts for the number of predictors relative to the sample size.
  • Interpretation: It represents the proportion of variance explained after accounting for model complexity.
  • Comparison Across Models: Adjusted \(R^2\) is useful for comparing models with different numbers of predictors, as it discourages overfitting.

When Does Adjusted \(R^2\) Increase?

  • The adjusted \(R^2\) will increase if and only if the inclusion of a new variable improves the model more than expected by chance.
  • Mathematically: This typically occurs when the absolute value of the \(t\)-statistic for the new variable is greater than 1 (assuming large samples and standard model assumptions).

Limitations of Adjusted \(R^2\)

  • Sensitive to Assumptions: Adjusted \(R^2\) assumes homoskedasticity (constant variance of errors) and no autocorrelation. In the presence of heteroskedasticity, its interpretation may be misleading.
  • Not a Substitute for Hypothesis Testing: It should not be the primary criterion for deciding which variables to include in a model.
    • Use \(t\)-tests to evaluate the significance of individual coefficients.
    • Use \(F\)-tests for assessing the joint significance of multiple variables.

14.2.4 Comparing Models with Transformed Dependent Variables

When comparing regression models with different transformations of the dependent variable, such as level and log-linear models, direct comparisons using traditional goodness-of-fit metrics like \(R^2\) or adjusted \(R^2\) are invalid. This is because the transformation changes the scale of the dependent variable, affecting the calculation of the Total Sum of Squares (SST), which is the denominator in \(R^2\) calculations.


Model Specifications

  1. Level Model (Linear):

\[ y = \beta_0 + \beta_1 x_1 + \epsilon \]

  1. Log-Linear Model:

\[ \ln(y) = \beta_0 + \beta_1 x_1 + \epsilon \]

Where:

  • \(y\) is the dependent variable,

  • \(x_1\) is an independent variable,

  • \(\epsilon\) represents the error term.


Interpretation of Coefficients

  • In the Level Model:
    The effect of \(x_1\) on \(y\) is constant, regardless of the magnitude of \(y\). Specifically, a one-unit increase in \(x_1\) results in a change of \(\beta_1\) units in \(y\). This implies:

    \[ \Delta y = \beta_1 \cdot \Delta x_1 \]

  • In the Log Model:
    The effect of \(x_1\) on \(y\) is proportional to the current level of \(y\). A one-unit increase in \(x_1\) leads to a percentage change in \(y\), approximately equal to \(100 \times \beta_1\%\). Specifically:

    \[ \Delta \ln(y) = \beta_1 \cdot \Delta x_1 \quad \Rightarrow \quad \% \Delta y \approx 100 \times \beta_1 \]

    • For small values of \(y\), the absolute change is small.
    • For large values of \(y\), the absolute change is larger, reflecting the multiplicative nature of the model.

Why We Cannot Compare \(R^2\) or Adjusted \(R^2\) Directly

  • The level model explains variance in the original scale of \(y\), while the log model explains variance in the logarithmic scale of \(y\).
  • The SST (Total Sum of Squares) differs across the models because the dependent variable is transformed, making direct comparisons of \(R^2\) invalid.
  • Adjusted \(R^2\) does not resolve this issue because it also depends on the scale of the dependent variable.

Approach to Compare Model Fit Across Transformations

To compare models on the same scale as the original dependent variable (\(y\)), we need to “un-transform” the predictions from the log model. Here’s the step-by-step procedure:


Step-by-Step Procedure

  1. Estimate the Log Model
    Fit the log-linear model and obtain the predicted values:

    \[ \widehat{\ln(y)} = \hat{\beta}_0 + \hat{\beta}_1 x_1 \]

  2. Un-Transform the Predictions
    Convert the predicted values back to the original scale of \(y\) using the exponential function:

    \[ \hat{m} = \exp(\widehat{\ln(y)}) \]

    • This transformation assumes that the errors are homoskedastic in the log model.
    • Note: To correct for potential bias due to Jensen’s inequality, a smearing estimator can be applied, but for simplicity, we use the basic exponential transformation here.
  3. Fit a Regression Without an Intercept
    Regress the actual \(y\) on the un-transformed predictions \(\hat{m}\) without an intercept:

    \[ y = \alpha \hat{m} + u \]

    • The coefficient \(\alpha\) adjusts for any scaling differences between the predicted and actual values.
    • The residual term \(u\) captures the unexplained variance.
  4. Compute the Scaled \(R^2\)
    Calculate the squared correlation between the observed \(y\) and the predicted values \(\hat{y}\) from the above regression:

    \[ R^2_{\text{scaled}} = \left( \text{Corr}(y, \hat{y}) \right)^2 \]

    • This scaled \(R^2\) represents how well the log-transformed model predicts the original \(y\) on its natural scale.
    • Now, you can compare \(R^2_{\text{scaled}}\) from the log model with the regular \(R^2\) from the level model.

Key Insights

  • If \(R^2_{\text{scaled}}\) (from the log model) > \(R^2\) (from the level model): The log model fits the data better.
  • If \(R^2_{\text{scaled}}\) < \(R^2\) (from the level model): The level model provides a better fit.
  • If both are similar: Consider other model diagnostics, theoretical justification, or model simplicity.

Caveats and Considerations

  • Heteroskedasticity: If heteroskedasticity is present, the un-transformation may introduce bias.

  • Error Distribution: Log-transformed models assume multiplicative errors, which may not be appropriate in all contexts.

  • Smearing Estimator (Advanced Correction): To adjust for bias in the back-transformation, apply the smearing estimator:

    \[ \hat{y} = \exp(\widehat{\ln(y)}) \cdot \hat{S} \]

    Where \(\hat{S}\) is the mean of the exponentiated residuals from the log model.

# Install and load necessary libraries
# install.packages("nonnest2")  # Uncomment if not already installed
library(nonnest2)    # For Vuong Test
library(lmtest)      # For J-Test

# Simulated dataset
set.seed(123)
n <- 100
x <- rnorm(n, mean = 50, sd = 10)
z <- rnorm(n, mean = 100, sd = 20)
epsilon <- rnorm(n)

# Competing models (non-nested)
# Model A: Linear relationship with x
y <- 5 + 0.3 * x + epsilon
model_A <- lm(y ~ x)

# Model B: Log-linear relationship with z
model_B <- lm(y ~ log(z))

# ----------------------------------------------------------------------
# Vuong Test (Correct Function)
# ----------------------------------------------------------------------
vuong_test <- vuongtest(model_A, model_B)
print(vuong_test)
#> 
#> Model 1 
#>  Class: lm 
#>  Call: lm(formula = y ~ x)
#> 
#> Model 2 
#>  Class: lm 
#>  Call: lm(formula = y ~ log(z))
#> 
#> Variance test 
#>   H0: Model 1 and Model 2 are indistinguishable 
#>   H1: Model 1 and Model 2 are distinguishable 
#>     w2 = 0.681,   p = 2.35e-08
#> 
#> Non-nested likelihood ratio test 
#>   H0: Model fits are equal for the focal population 
#>   H1A: Model 1 fits better than Model 2 
#>     z = 13.108,   p = <2e-16
#>   H1B: Model 2 fits better than Model 1 
#>     z = 13.108,   p = 1

# ----------------------------------------------------------------------
# Davidson–MacKinnon J-Test
# ----------------------------------------------------------------------

# Step 1: Testing Model A against Model B
# Obtain fitted values from Model B
fitted_B <- fitted(model_B)

# Auxiliary regression: Add fitted_B to Model A
j_test_A_vs_B <- lm(y ~ x + fitted_B)
summary(j_test_A_vs_B)
#> 
#> Call:
#> lm(formula = y ~ x + fitted_B)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.8717 -0.6573 -0.1223  0.6154  2.0952 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 14.70881   25.98307   0.566    0.573    
#> x            0.28671    0.01048  27.358   <2e-16 ***
#> fitted_B    -0.43702    1.27500  -0.343    0.733    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.951 on 97 degrees of freedom
#> Multiple R-squared:  0.8854, Adjusted R-squared:  0.883 
#> F-statistic: 374.5 on 2 and 97 DF,  p-value: < 2.2e-16

# Step 2: Testing Model B against Model A
# Obtain fitted values from Model A
fitted_A <- fitted(model_A)

# Auxiliary regression: Add fitted_A to Model B
j_test_B_vs_A <- lm(y ~ log(z) + fitted_A)
summary(j_test_B_vs_A)
#> 
#> Call:
#> lm(formula = y ~ log(z) + fitted_A)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.8717 -0.6573 -0.1223  0.6154  2.0952 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.77868    2.39275  -0.325    0.746    
#> log(z)       0.16829    0.49097   0.343    0.733    
#> fitted_A     1.00052    0.03657  27.358   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.951 on 97 degrees of freedom
#> Multiple R-squared:  0.8854, Adjusted R-squared:  0.883 
#> F-statistic: 374.5 on 2 and 97 DF,  p-value: < 2.2e-16

References

Davidson, Russell, and James G MacKinnon. 1981. “Several Tests for Model Specification in the Presence of Alternative Hypotheses.” Econometrica: Journal of the Econometric Society, 781–93.
Vuong, Quang H. 1989. “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses.” Econometrica: Journal of the Econometric Society, 307–33.