14.2 Non-Nested Model Tests
Comparing models is essential to identify which specification best explains the data. While nested model comparisons rely on one model being a restricted version of another, non-nested models do not share such a hierarchical structure. This situation commonly arises when comparing models with:
- Different functional forms (e.g., linear vs. logarithmic relationships),
- Different sets of explanatory variables,
- Competing theoretical frameworks.
To compare non-nested models, we rely on specialized statistical tests designed to handle these complexities. The two most widely used approaches are:
Vuong Test used to compare the fit of two non-nested models to determine which model better explains the data.
Davidson–MacKinnon J-Test is a regression-based approach for comparing non-nested models. It evaluates which model better fits the data by incorporating the predicted values from the competing model as an additional regressor.
Consider two competing models:
- Model A:
y=α0+α1f(X)+ϵA - Model B:
y=β0+β1g(Z)+ϵB
Where:
f(X) and g(Z) represent different functional forms or different sets of explanatory variables.
The models are non-nested because one cannot be obtained from the other by restricting parameters.
Our goal is to determine which model better explains the data.
14.2.1 Vuong Test
The Vuong Test is a likelihood-ratio-based approach for comparing non-nested models (Vuong 1989). It is particularly useful when both models are estimated via Maximum Likelihood Estimation.
Hypotheses
- Null Hypothesis (H0): Both models are equally close to the true data-generating process (i.e., the models have equal predictive power).
- Alternative Hypothesis (H1):
- Positive Test Statistic (V>0): Model A is preferred.
- Negative Test Statistic (V<0): Model B is preferred.
Vuong Test Statistic
The Vuong test is based on the difference in the log-likelihood contributions of each observation under the two models. Let:
- ℓA,i = log-likelihood of observation i under Model A,
- ℓB,i = log-likelihood of observation i under Model B.
Define the difference in log-likelihoods:
mi=ℓA,i−ℓB,i
The Vuong test statistic is:
V=√nˉmsm
Where:
ˉm=1n∑ni=1mi is the sample mean of the log-likelihood differences,
sm=√1n∑ni=1(mi−ˉm)2 is the sample standard deviation of mi,
n is the sample size.
Distribution and Decision Rule
- Under H0, the Vuong statistic asymptotically follows a standard normal distribution:
V∼N(0,1)
- Decision Rule:
- If |V|>zα/2 (critical value from the standard normal distribution), reject H0.
- If V>0: Prefer Model A.
- If V<0: Prefer Model B.
- If |V|≤zα/2: Fail to reject H0; no significant difference between models.
- If |V|>zα/2 (critical value from the standard normal distribution), reject H0.
Corrections for Model Complexity
When comparing models with different numbers of parameters, a penalized version of the Vuong test can be used, similar to adjusting for model complexity in criteria like AIC or BIC. The corrected statistic is:
Vadjusted=V−(kA−kB)ln(n)2sm√n
Where kA and kB are the number of parameters in Models A and B, respectively.
Limitations of the Vuong Test
- Requires models to be estimated via Maximum Likelihood.
- Sensitive to model misspecification and heteroskedasticity.
- Assumes independent and identically distributed (i.i.d.) errors.
14.2.2 Davidson–MacKinnon J-Test
The Davidson–MacKinnon J-Test provides a flexible, regression-based approach for comparing non-nested models (Davidson and MacKinnon 1981). It evaluates whether the predictions from one model contain information not captured by the competing model. This test can be thought of as comparing models with transformed independent variables, as opposed to the next section, Comparing Models with Transformed Dependent Variables.
Hypotheses
- Null Hypothesis (H0): The competing model does not provide additional explanatory power beyond the current model.
- Alternative Hypothesis (H1): The competing model provides additional explanatory power.
Procedure
Consider two competing models:
- Model A:
y=α0+α1x+ϵA - Model B:
y=β0+β1ln(x)+ϵB
Step 1: Testing Model A Against Model B
- Estimate Model B and obtain predicted values ˆyB.
- Run the auxiliary regression:
y=α0+α1x+γˆyB+u
- Test the null hypothesis:
H0:γ=0
- If γ is significant, Model B adds explanatory power beyond Model A.
- If γ is not significant, Model A sufficiently explains the data.
Step 2: Testing Model B Against Model A
- Estimate Model A and obtain predicted values ˆyA.
- Run the auxiliary regression:
y=β0+β1ln(x)+γˆyA+u
- Test the null hypothesis:
H0:γ=0
Decision Rules
- Reject H0 in Step 1, Fail to Reject in Step 2: Prefer Model B.
- Fail to Reject H0 in Step 1, Reject in Step 2: Prefer Model A.
- Reject H0 in Both Steps: Neither model is adequate; reconsider the functional form.
- Fail to Reject H0 in Both Steps: No strong evidence to prefer one model; rely on other criteria (e.g., theory, simplicity). Alternatively, R2adj can also be used to choose between the two.
Adjusted R2
- R2 will always increase with more variables included
- Adjusted R2 tries to correct by penalizing inclusion of unnecessary variables.
R2=1−SSR/nSST/nR2adj=1−SSR/(n−k)SST/(n−1)=1−(n−1)(1−R2)(n−k)
- R2adj increases if and only if the t-statistic on the additional variable is greater than 1 in absolute value.
- R2adj is valid in models where there is no heteroskedasticity
- there fore it should not be used in determining which variables should be included in the model (the t or F-tests are more appropriate)
14.2.3 Adjusted R2
The coefficient of determination (R2) measures the proportion of the variance in the dependent variable that is explained by the model. However, a key limitation of R2 is that it always increases (or at least stays the same) when additional explanatory variables are added to the model, even if those variables are not statistically significant.
To address this issue, the adjusted R2 introduces a penalty for including unnecessary variables, making it a more reliable measure when comparing models with different numbers of predictors.
Formulas
Unadjusted R2:
R2=1−SSRSST
Where:
SSR = Sum of Squared Residuals (measures unexplained variance),
SST = Total Sum of Squares (measures total variance in the dependent variable).
Adjusted R2:
R2adj=1−SSR/(n−k)SST/(n−1)
Alternatively, it can be expressed in terms of R2 as:
R2adj=1−((1−R2)(n−1)n−k)
Where:
n = Number of observations,
k = Number of estimated parameters in the model (including the intercept).
Key Insights
- Penalty for Complexity: Unlike R2, the adjusted R2 can decrease when irrelevant variables are added to the model because it adjusts for the number of predictors relative to the sample size.
- Interpretation: It represents the proportion of variance explained after accounting for model complexity.
- Comparison Across Models: Adjusted R2 is useful for comparing models with different numbers of predictors, as it discourages overfitting.
When Does Adjusted R2 Increase?
- The adjusted R2 will increase if and only if the inclusion of a new variable improves the model more than expected by chance.
- Mathematically: This typically occurs when the absolute value of the t-statistic for the new variable is greater than 1 (assuming large samples and standard model assumptions).
Limitations of Adjusted R2
- Sensitive to Assumptions: Adjusted R2 assumes homoskedasticity (constant variance of errors) and no autocorrelation. In the presence of heteroskedasticity, its interpretation may be misleading.
- Not a Substitute for Hypothesis Testing: It should not be the primary criterion for deciding which variables to include in a model.
- Use t-tests to evaluate the significance of individual coefficients.
- Use F-tests for assessing the joint significance of multiple variables.
14.2.4 Comparing Models with Transformed Dependent Variables
When comparing regression models with different transformations of the dependent variable, such as level and log-linear models, direct comparisons using traditional goodness-of-fit metrics like R2 or adjusted R2 are invalid. This is because the transformation changes the scale of the dependent variable, affecting the calculation of the Total Sum of Squares (SST), which is the denominator in R2 calculations.
Model Specifications
- Level Model (Linear):
y=β0+β1x1+ϵ
- Log-Linear Model:
ln(y)=β0+β1x1+ϵ
Where:
y is the dependent variable,
x1 is an independent variable,
ϵ represents the error term.
Interpretation of Coefficients
In the Level Model:
The effect of x1 on y is constant, regardless of the magnitude of y. Specifically, a one-unit increase in x1 results in a change of β1 units in y. This implies:Δy=β1⋅Δx1
In the Log Model:
The effect of x1 on y is proportional to the current level of y. A one-unit increase in x1 leads to a percentage change in y, approximately equal to 100×β1%. Specifically:Δln(y)=β1⋅Δx1⇒%Δy≈100×β1
- For small values of y, the absolute change is small.
- For large values of y, the absolute change is larger, reflecting the multiplicative nature of the model.
Why We Cannot Compare R2 or Adjusted R2 Directly
- The level model explains variance in the original scale of y, while the log model explains variance in the logarithmic scale of y.
- The SST (Total Sum of Squares) differs across the models because the dependent variable is transformed, making direct comparisons of R2 invalid.
- Adjusted R2 does not resolve this issue because it also depends on the scale of the dependent variable.
Approach to Compare Model Fit Across Transformations
To compare models on the same scale as the original dependent variable (y), we need to “un-transform” the predictions from the log model. Here’s the step-by-step procedure:
Step-by-Step Procedure
Estimate the Log Model
Fit the log-linear model and obtain the predicted values:^ln(y)=ˆβ0+ˆβ1x1
Un-Transform the Predictions
Convert the predicted values back to the original scale of y using the exponential function:ˆm=exp(^ln(y))
- This transformation assumes that the errors are homoskedastic in the log model.
- Note: To correct for potential bias due to Jensen’s inequality, a smearing estimator can be applied, but for simplicity, we use the basic exponential transformation here.
Fit a Regression Without an Intercept
Regress the actual y on the un-transformed predictions ˆm without an intercept:y=αˆm+u
- The coefficient α adjusts for any scaling differences between the predicted and actual values.
- The residual term u captures the unexplained variance.
Compute the Scaled R2
Calculate the squared correlation between the observed y and the predicted values ˆy from the above regression:R2scaled=(Corr(y,ˆy))2
- This scaled R2 represents how well the log-transformed model predicts the original y on its natural scale.
- Now, you can compare R2scaled from the log model with the regular R2 from the level model.
Key Insights
- If R2scaled (from the log model) > R2 (from the level model): The log model fits the data better.
- If R2scaled < R2 (from the level model): The level model provides a better fit.
- If both are similar: Consider other model diagnostics, theoretical justification, or model simplicity.
Caveats and Considerations
Heteroskedasticity: If heteroskedasticity is present, the un-transformation may introduce bias.
Error Distribution: Log-transformed models assume multiplicative errors, which may not be appropriate in all contexts.
Smearing Estimator (Advanced Correction): To adjust for bias in the back-transformation, apply the smearing estimator:
ˆy=exp(^ln(y))⋅ˆS
Where ˆS is the mean of the exponentiated residuals from the log model.
# Install and load necessary libraries
# install.packages("nonnest2") # Uncomment if not already installed
library(nonnest2) # For Vuong Test
library(lmtest) # For J-Test
# Simulated dataset
set.seed(123)
n <- 100
x <- rnorm(n, mean = 50, sd = 10)
z <- rnorm(n, mean = 100, sd = 20)
epsilon <- rnorm(n)
# Competing models (non-nested)
# Model A: Linear relationship with x
y <- 5 + 0.3 * x + epsilon
model_A <- lm(y ~ x)
# Model B: Log-linear relationship with z
model_B <- lm(y ~ log(z))
# ----------------------------------------------------------------------
# Vuong Test (Correct Function)
# ----------------------------------------------------------------------
vuong_test <- vuongtest(model_A, model_B)
print(vuong_test)
#>
#> Model 1
#> Class: lm
#> Call: lm(formula = y ~ x)
#>
#> Model 2
#> Class: lm
#> Call: lm(formula = y ~ log(z))
#>
#> Variance test
#> H0: Model 1 and Model 2 are indistinguishable
#> H1: Model 1 and Model 2 are distinguishable
#> w2 = 0.681, p = 2.35e-08
#>
#> Non-nested likelihood ratio test
#> H0: Model fits are equal for the focal population
#> H1A: Model 1 fits better than Model 2
#> z = 13.108, p = <2e-16
#> H1B: Model 2 fits better than Model 1
#> z = 13.108, p = 1
# ----------------------------------------------------------------------
# Davidson–MacKinnon J-Test
# ----------------------------------------------------------------------
# Step 1: Testing Model A against Model B
# Obtain fitted values from Model B
fitted_B <- fitted(model_B)
# Auxiliary regression: Add fitted_B to Model A
j_test_A_vs_B <- lm(y ~ x + fitted_B)
summary(j_test_A_vs_B)
#>
#> Call:
#> lm(formula = y ~ x + fitted_B)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.8717 -0.6573 -0.1223 0.6154 2.0952
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 14.70881 25.98307 0.566 0.573
#> x 0.28671 0.01048 27.358 <2e-16 ***
#> fitted_B -0.43702 1.27500 -0.343 0.733
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.951 on 97 degrees of freedom
#> Multiple R-squared: 0.8854, Adjusted R-squared: 0.883
#> F-statistic: 374.5 on 2 and 97 DF, p-value: < 2.2e-16
# Step 2: Testing Model B against Model A
# Obtain fitted values from Model A
fitted_A <- fitted(model_A)
# Auxiliary regression: Add fitted_A to Model B
j_test_B_vs_A <- lm(y ~ log(z) + fitted_A)
summary(j_test_B_vs_A)
#>
#> Call:
#> lm(formula = y ~ log(z) + fitted_A)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.8717 -0.6573 -0.1223 0.6154 2.0952
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.77868 2.39275 -0.325 0.746
#> log(z) 0.16829 0.49097 0.343 0.733
#> fitted_A 1.00052 0.03657 27.358 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.951 on 97 degrees of freedom
#> Multiple R-squared: 0.8854, Adjusted R-squared: 0.883
#> F-statistic: 374.5 on 2 and 97 DF, p-value: < 2.2e-16