14.4 Functional Form Tests

Functional form misspecification occurs when the chosen regression model does not correctly represent the true relationship between the dependent and independent variables. This can happen due to:

  • Omitted variables (important predictors not included),
  • Incorrect transformations of variables (e.g., missing nonlinear relationships),
  • Incorrect interaction terms (missing interaction effects between variables),
  • Inappropriate linearity assumptions.

Functional form errors can lead to biased and inconsistent estimators, undermining the validity of statistical inferences. To detect such issues, several diagnostic tests are available.

Key Functional Form Tests:

  1. Ramsey RESET Test (Regression Equation Specification Error Test)
  2. Harvey–Collier Test
  3. Rainbow Test

Each test focuses on identifying different aspects of potential model misspecification.


14.4.1 Ramsey RESET Test (Regression Equation Specification Error Test)

The Ramsey RESET Test is one of the most widely used tests to detect functional form misspecification (Ramsey 1969). It examines whether adding nonlinear transformations of the fitted values (or regressors) improves the model fit.

Hypotheses

  • Null Hypothesis (H0): The model is correctly specified.
  • Alternative Hypothesis (H1): The model suffers from omitted variables, incorrect functional form, or other specification errors.

Procedure

  1. Estimate the original regression model:

    yi=β0+β1x1i+β2x2i++βkxki+ϵi

  2. Obtain the fitted values:

    ˆyi

  3. Augment the model with powers of the fitted values (squared, cubed, etc.):

    yi=β0+β1x1i++βkxki+γ1ˆy2i+γ2ˆy3i+ui

  4. Test the joint significance of the added terms:

    H0:γ1=γ2=0

  5. Compute the F-statistic:

    F=(SSRrestrictedSSRunrestricted)/qSSRunrestricted/(nkq1)

    Where:

    • SSRrestricted = Sum of Squared Residuals from the original model,
    • SSRunrestricted = SSR from the augmented model,
    • q = Number of additional terms (e.g., 2 if adding ˆy2 and ˆy3),
    • n = Sample size,
    • k = Number of predictors in the original model.

Decision Rule

  • Under H0, the F-statistic follows an F-distribution with (q,nkq1) degrees of freedom.
  • Reject H0 if the F-statistic exceeds the critical value, indicating functional form misspecification.

Advantages and Limitations

  • Advantage: Simple to implement; detects omitted variables and incorrect functional forms.
  • Limitation: Does not identify which variable or functional form is incorrect—only indicates the presence of an issue.

14.4.2 Harvey–Collier Test

The Harvey–Collier Test evaluates whether the model’s residuals display systematic patterns, which would indicate functional form misspecification (Harvey and Collier 1977). It is based on testing for a non-zero mean in the residuals after projection onto specific components.

Hypotheses

  • Null Hypothesis (H0): The model is correctly specified (residuals are random noise with zero mean).
  • Alternative Hypothesis (H1): The model is misspecified (residuals contain systematic patterns).

Procedure

  1. Estimate the original regression model and obtain residuals ˆϵi.

  2. Project the residuals onto the space spanned by a specially constructed test vector (often derived from the inverse of the design matrix in linear regression).

  3. Calculate the Harvey–Collier statistic:

    t=ˉϵSE(ˉϵ)

    Where:

    • ˉϵ is the mean of the projected residuals,
    • SE(ˉϵ) is the standard error of the mean residual.

Decision Rule:

  • The test statistic follows a t-distribution under H0.
  • Reject H0 if the t-statistic is significantly different from zero.

Advantages and Limitations

  • Advantage: Simple to apply and interpret; good for detecting subtle misspecifications.
  • Limitation: Sensitive to outliers; may have reduced power in small samples.

14.4.3 Rainbow Test

The Rainbow Test is a general-purpose diagnostic tool for functional form misspecification (Utts 1982). It compares the performance of the model on the full sample versus a central subsample, where the central subsample contains observations near the median of the independent variables.

Hypotheses

  • Null Hypothesis (H0): The model is correctly specified.
  • Alternative Hypothesis (H1): The model is misspecified.

Procedure

  1. Estimate the regression model on the full dataset and record the residuals.

  2. Identify a central subsample (e.g., observations near the median of key predictors).

  3. Estimate the model again on the central subsample.

  4. Compare the predictive accuracy between the full sample and subsample using an F-statistic:

    F=(SSRfullSSRsubsample)/qSSRsubsample/(nkq)

    Where q is the number of restrictions implied by using the subsample.


Decision Rule

  • Under H0, the test statistic follows an F-distribution.
  • Reject H0 if the F-statistic is significant, indicating model misspecification.

Advantages and Limitations

  • Advantage: Robust to various forms of misspecification.
  • Limitation: Choice of subsample may influence results; less informative about the specific nature of the misspecification.

14.4.4 Summary of Functional Form Tests

Test Type Key Statistic Purpose When to Use
Ramsey RESET Test Augmented regression F-test Detects omitted variables, nonlinearities General model specification testing
Harvey–Collier Test Residual-based t-test Detects systematic patterns in residuals Subtle misspecifications in linear models
Rainbow Test Subsample comparison F-test Tests model stability across subsamples Comparing central vs. full sample

Functional form misspecification can severely distort regression results, leading to biased estimates and invalid inferences. While no single test can detect all types of misspecification, using a combination of tests provides a robust framework for model diagnostics.

# Install and load necessary libraries
# install.packages("lmtest")      # For RESET and Harvey–Collier Test
# install.packages("car")         # For diagnostic tests
# install.packages("strucchange") # For Rainbow Test

library(lmtest)
library(car)
library(strucchange)

# Simulated dataset
set.seed(123)
n <- 100
x1 <- rnorm(n, mean = 50, sd = 10)
x2 <- rnorm(n, mean = 30, sd = 5)
epsilon <- rnorm(n)
y <- 5 + 0.4 * x1 - 0.3 * x2 + epsilon

# Original regression model
model <- lm(y ~ x1 + x2)

# ----------------------------------------------------------------------
# 1. Ramsey RESET Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
reset_test <-
    resettest(model, power = 2:3, type = "fitted")  # Adds ŷ² and ŷ³
print(reset_test)
#> 
#>  RESET test
#> 
#> data:  model
#> RESET = 0.1921, df1 = 2, df2 = 95, p-value = 0.8255

# ----------------------------------------------------------------------
# 2. Harvey–Collier Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified (residuals have zero mean)
hc_test <- harvtest(model)
print(hc_test)
#> 
#>  Harvey-Collier test
#> 
#> data:  model
#> HC = 0.041264, df = 96, p-value = 0.9672

# ----------------------------------------------------------------------
# 3. Rainbow Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
rainbow_test <- lmtest::raintest (model)
print(rainbow_test)
#> 
#>  Rainbow test
#> 
#> data:  model
#> Rain = 1.1857, df1 = 50, df2 = 47, p-value = 0.279

Interpretation of the Results

  1. Ramsey RESET Test (Regression Equation Specification Error Test)

    • Null Hypothesis (H0): The model is correctly specified.

    • Alternative Hypothesis (H1): The model suffers from omitted variables, incorrect functional form, or other specification errors.

    • Decision Rule:

      • Reject H0 if p-value <0.05 → Evidence of model misspecification (e.g., missing nonlinear terms).
      • Fail to reject H0 if p-value 0.05 → No strong evidence of misspecification.
  2. Harvey–Collier Test

    • Null Hypothesis (H0): The model is correctly specified (residuals are random noise with zero mean).

    • Alternative Hypothesis (H1): The model is misspecified (residuals contain systematic patterns).

    • Decision Rule:

      • Reject H0 if p-value <0.05 → Model misspecification detected (non-random residual patterns).
      • Fail to reject H0 if p-value 0.05 → No evidence of misspecification.
  3. Rainbow Test

    • Null Hypothesis (H0): The model is correctly specified.

    • Alternative Hypothesis (H1): The model is misspecified.

    • Decision Rule:

      • Reject H0 if p-value <0.05 → Evidence of model misspecification (model performs differently on subsets).
      • Fail to reject H0 if p-value 0.05 → Model specification appears valid.

References

Harvey, Andrew C, and Patrick Collier. 1977. “Testing for Functional Misspecification in Regression Analysis.” Journal of Econometrics 6 (1): 103–19.
Ramsey, James Bernard. 1969. “Tests for Specification Errors in Classical Linear Least-Squares Regression Analysis.” Journal of the Royal Statistical Society Series B: Statistical Methodology 31 (2): 350–71.
Utts, Jessica M. 1982. “The Rainbow Test for Lack of Fit in Regression.” Communications in Statistics-Theory and Methods 11 (24): 2801–15.