14.4 Functional Form Tests
Functional form misspecification occurs when the chosen regression model does not correctly represent the true relationship between the dependent and independent variables. This can happen due to:
- Omitted variables (important predictors not included),
- Incorrect transformations of variables (e.g., missing nonlinear relationships),
- Incorrect interaction terms (missing interaction effects between variables),
- Inappropriate linearity assumptions.
Functional form errors can lead to biased and inconsistent estimators, undermining the validity of statistical inferences. To detect such issues, several diagnostic tests are available.
Key Functional Form Tests:
Each test focuses on identifying different aspects of potential model misspecification.
14.4.1 Ramsey RESET Test (Regression Equation Specification Error Test)
The Ramsey RESET Test is one of the most widely used tests to detect functional form misspecification (Ramsey 1969). It examines whether adding nonlinear transformations of the fitted values (or regressors) improves the model fit.
Hypotheses
- Null Hypothesis (H0): The model is correctly specified.
- Alternative Hypothesis (H1): The model suffers from omitted variables, incorrect functional form, or other specification errors.
Procedure
Estimate the original regression model:
yi=β0+β1x1i+β2x2i+⋯+βkxki+ϵi
Obtain the fitted values:
ˆyi
Augment the model with powers of the fitted values (squared, cubed, etc.):
yi=β0+β1x1i+⋯+βkxki+γ1ˆy2i+γ2ˆy3i+ui
Test the joint significance of the added terms:
H0:γ1=γ2=0
Compute the F-statistic:
F=(SSRrestricted−SSRunrestricted)/qSSRunrestricted/(n−k−q−1)
Where:
- SSRrestricted = Sum of Squared Residuals from the original model,
- SSRunrestricted = SSR from the augmented model,
- q = Number of additional terms (e.g., 2 if adding ˆy2 and ˆy3),
- n = Sample size,
- k = Number of predictors in the original model.
Decision Rule
- Under H0, the F-statistic follows an F-distribution with (q,n−k−q−1) degrees of freedom.
- Reject H0 if the F-statistic exceeds the critical value, indicating functional form misspecification.
Advantages and Limitations
- Advantage: Simple to implement; detects omitted variables and incorrect functional forms.
- Limitation: Does not identify which variable or functional form is incorrect—only indicates the presence of an issue.
14.4.2 Harvey–Collier Test
The Harvey–Collier Test evaluates whether the model’s residuals display systematic patterns, which would indicate functional form misspecification (Harvey and Collier 1977). It is based on testing for a non-zero mean in the residuals after projection onto specific components.
Hypotheses
- Null Hypothesis (H0): The model is correctly specified (residuals are random noise with zero mean).
- Alternative Hypothesis (H1): The model is misspecified (residuals contain systematic patterns).
Procedure
Estimate the original regression model and obtain residuals ˆϵi.
Project the residuals onto the space spanned by a specially constructed test vector (often derived from the inverse of the design matrix in linear regression).
Calculate the Harvey–Collier statistic:
t=ˉϵSE(ˉϵ)
Where:
- ˉϵ is the mean of the projected residuals,
- SE(ˉϵ) is the standard error of the mean residual.
Decision Rule:
- The test statistic follows a t-distribution under H0.
- Reject H0 if the t-statistic is significantly different from zero.
Advantages and Limitations
- Advantage: Simple to apply and interpret; good for detecting subtle misspecifications.
- Limitation: Sensitive to outliers; may have reduced power in small samples.
14.4.3 Rainbow Test
The Rainbow Test is a general-purpose diagnostic tool for functional form misspecification (Utts 1982). It compares the performance of the model on the full sample versus a central subsample, where the central subsample contains observations near the median of the independent variables.
Hypotheses
- Null Hypothesis (H0): The model is correctly specified.
- Alternative Hypothesis (H1): The model is misspecified.
Procedure
Estimate the regression model on the full dataset and record the residuals.
Identify a central subsample (e.g., observations near the median of key predictors).
Estimate the model again on the central subsample.
Compare the predictive accuracy between the full sample and subsample using an F-statistic:
F=(SSRfull−SSRsubsample)/qSSRsubsample/(n−k−q)
Where q is the number of restrictions implied by using the subsample.
Decision Rule
- Under H0, the test statistic follows an F-distribution.
- Reject H0 if the F-statistic is significant, indicating model misspecification.
Advantages and Limitations
- Advantage: Robust to various forms of misspecification.
- Limitation: Choice of subsample may influence results; less informative about the specific nature of the misspecification.
14.4.4 Summary of Functional Form Tests
Test | Type | Key Statistic | Purpose | When to Use |
---|---|---|---|---|
Ramsey RESET Test | Augmented regression | F-test | Detects omitted variables, nonlinearities | General model specification testing |
Harvey–Collier Test | Residual-based | t-test | Detects systematic patterns in residuals | Subtle misspecifications in linear models |
Rainbow Test | Subsample comparison | F-test | Tests model stability across subsamples | Comparing central vs. full sample |
Functional form misspecification can severely distort regression results, leading to biased estimates and invalid inferences. While no single test can detect all types of misspecification, using a combination of tests provides a robust framework for model diagnostics.
# Install and load necessary libraries
# install.packages("lmtest") # For RESET and Harvey–Collier Test
# install.packages("car") # For diagnostic tests
# install.packages("strucchange") # For Rainbow Test
library(lmtest)
library(car)
library(strucchange)
# Simulated dataset
set.seed(123)
n <- 100
x1 <- rnorm(n, mean = 50, sd = 10)
x2 <- rnorm(n, mean = 30, sd = 5)
epsilon <- rnorm(n)
y <- 5 + 0.4 * x1 - 0.3 * x2 + epsilon
# Original regression model
model <- lm(y ~ x1 + x2)
# ----------------------------------------------------------------------
# 1. Ramsey RESET Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
reset_test <-
resettest(model, power = 2:3, type = "fitted") # Adds ŷ² and ŷ³
print(reset_test)
#>
#> RESET test
#>
#> data: model
#> RESET = 0.1921, df1 = 2, df2 = 95, p-value = 0.8255
# ----------------------------------------------------------------------
# 2. Harvey–Collier Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified (residuals have zero mean)
hc_test <- harvtest(model)
print(hc_test)
#>
#> Harvey-Collier test
#>
#> data: model
#> HC = 0.041264, df = 96, p-value = 0.9672
# ----------------------------------------------------------------------
# 3. Rainbow Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
rainbow_test <- lmtest::raintest (model)
print(rainbow_test)
#>
#> Rainbow test
#>
#> data: model
#> Rain = 1.1857, df1 = 50, df2 = 47, p-value = 0.279
Interpretation of the Results
Ramsey RESET Test (Regression Equation Specification Error Test)
Null Hypothesis (H0): The model is correctly specified.
Alternative Hypothesis (H1): The model suffers from omitted variables, incorrect functional form, or other specification errors.
Decision Rule:
- Reject H0 if p-value <0.05 → Evidence of model misspecification (e.g., missing nonlinear terms).
- Fail to reject H0 if p-value ≥0.05 → No strong evidence of misspecification.
Harvey–Collier Test
Null Hypothesis (H0): The model is correctly specified (residuals are random noise with zero mean).
Alternative Hypothesis (H1): The model is misspecified (residuals contain systematic patterns).
Decision Rule:
- Reject H0 if p-value <0.05 → Model misspecification detected (non-random residual patterns).
- Fail to reject H0 if p-value ≥0.05 → No evidence of misspecification.
Rainbow Test
Null Hypothesis (H0): The model is correctly specified.
Alternative Hypothesis (H1): The model is misspecified.
Decision Rule:
- Reject H0 if p-value <0.05 → Evidence of model misspecification (model performs differently on subsets).
- Fail to reject H0 if p-value ≥0.05 → Model specification appears valid.