14.4 Functional Form Tests

Functional form misspecification occurs when the chosen regression model does not correctly represent the true relationship between the dependent and independent variables. This can happen due to:

  • Omitted variables (important predictors not included),
  • Incorrect transformations of variables (e.g., missing nonlinear relationships),
  • Incorrect interaction terms (missing interaction effects between variables),
  • Inappropriate linearity assumptions.

Functional form errors can lead to biased and inconsistent estimators, undermining the validity of statistical inferences. To detect such issues, several diagnostic tests are available.

Key Functional Form Tests:

  1. Ramsey RESET Test (Regression Equation Specification Error Test)
  2. Harvey–Collier Test
  3. Rainbow Test

Each test focuses on identifying different aspects of potential model misspecification.


14.4.1 Ramsey RESET Test (Regression Equation Specification Error Test)

The Ramsey RESET Test is one of the most widely used tests to detect functional form misspecification (Ramsey 1969). It examines whether adding nonlinear transformations of the fitted values (or regressors) improves the model fit.

Hypotheses

  • Null Hypothesis (\(H_0\)): The model is correctly specified.
  • Alternative Hypothesis (\(H_1\)): The model suffers from omitted variables, incorrect functional form, or other specification errors.

Procedure

  1. Estimate the original regression model:

    \[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \beta_k x_{ki} + \epsilon_i \]

  2. Obtain the fitted values:

    \[ \hat{y}_i \]

  3. Augment the model with powers of the fitted values (squared, cubed, etc.):

    \[ y_i = \beta_0 + \beta_1 x_{1i} + \dots + \beta_k x_{ki} + \gamma_1 \hat{y}_i^2 + \gamma_2 \hat{y}_i^3 + u_i \]

  4. Test the joint significance of the added terms:

    \[ H_0: \gamma_1 = \gamma_2 = 0 \]

  5. Compute the F-statistic:

    \[ F = \frac{(SSR_{\text{restricted}} - SSR_{\text{unrestricted}}) / q}{SSR_{\text{unrestricted}} / (n - k - q - 1)} \]

    Where:

    • \(SSR_{\text{restricted}}\) = Sum of Squared Residuals from the original model,
    • \(SSR_{\text{unrestricted}}\) = SSR from the augmented model,
    • \(q\) = Number of additional terms (e.g., 2 if adding \(\hat{y}^2\) and \(\hat{y}^3\)),
    • \(n\) = Sample size,
    • \(k\) = Number of predictors in the original model.

Decision Rule

  • Under \(H_0\), the F-statistic follows an \(F\)-distribution with \((q, n - k - q - 1)\) degrees of freedom.
  • Reject \(H_0\) if the F-statistic exceeds the critical value, indicating functional form misspecification.

Advantages and Limitations

  • Advantage: Simple to implement; detects omitted variables and incorrect functional forms.
  • Limitation: Does not identify which variable or functional form is incorrect—only indicates the presence of an issue.

14.4.2 Harvey–Collier Test

The Harvey–Collier Test evaluates whether the model’s residuals display systematic patterns, which would indicate functional form misspecification (Harvey and Collier 1977). It is based on testing for a non-zero mean in the residuals after projection onto specific components.

Hypotheses

  • Null Hypothesis (\(H_0\)): The model is correctly specified (residuals are random noise with zero mean).
  • Alternative Hypothesis (\(H_1\)): The model is misspecified (residuals contain systematic patterns).

Procedure

  1. Estimate the original regression model and obtain residuals \(\hat{\epsilon}_i\).

  2. Project the residuals onto the space spanned by a specially constructed test vector (often derived from the inverse of the design matrix in linear regression).

  3. Calculate the Harvey–Collier statistic:

    \[ t = \frac{\bar{\epsilon}}{\text{SE}(\bar{\epsilon})} \]

    Where:

    • \(\bar{\epsilon}\) is the mean of the projected residuals,
    • \(\text{SE}(\bar{\epsilon})\) is the standard error of the mean residual.

Decision Rule:

  • The test statistic follows a \(t\)-distribution under \(H_0\).
  • Reject \(H_0\) if the \(t\)-statistic is significantly different from zero.

Advantages and Limitations

  • Advantage: Simple to apply and interpret; good for detecting subtle misspecifications.
  • Limitation: Sensitive to outliers; may have reduced power in small samples.

14.4.3 Rainbow Test

The Rainbow Test is a general-purpose diagnostic tool for functional form misspecification (Utts 1982). It compares the performance of the model on the full sample versus a central subsample, where the central subsample contains observations near the median of the independent variables.

Hypotheses

  • Null Hypothesis (\(H_0\)): The model is correctly specified.
  • Alternative Hypothesis (\(H_1\)): The model is misspecified.

Procedure

  1. Estimate the regression model on the full dataset and record the residuals.

  2. Identify a central subsample (e.g., observations near the median of key predictors).

  3. Estimate the model again on the central subsample.

  4. Compare the predictive accuracy between the full sample and subsample using an F-statistic:

    \[ F = \frac{(SSR_{\text{full}} - SSR_{\text{subsample}}) / q}{SSR_{\text{subsample}} / (n - k - q)} \]

    Where \(q\) is the number of restrictions implied by using the subsample.


Decision Rule

  • Under \(H_0\), the test statistic follows an \(F\)-distribution.
  • Reject \(H_0\) if the F-statistic is significant, indicating model misspecification.

Advantages and Limitations

  • Advantage: Robust to various forms of misspecification.
  • Limitation: Choice of subsample may influence results; less informative about the specific nature of the misspecification.

14.4.4 Summary of Functional Form Tests

Test Type Key Statistic Purpose When to Use
Ramsey RESET Test Augmented regression \(F\)-test Detects omitted variables, nonlinearities General model specification testing
Harvey–Collier Test Residual-based \(t\)-test Detects systematic patterns in residuals Subtle misspecifications in linear models
Rainbow Test Subsample comparison \(F\)-test Tests model stability across subsamples Comparing central vs. full sample

Functional form misspecification can severely distort regression results, leading to biased estimates and invalid inferences. While no single test can detect all types of misspecification, using a combination of tests provides a robust framework for model diagnostics.

# Install and load necessary libraries
# install.packages("lmtest")      # For RESET and Harvey–Collier Test
# install.packages("car")         # For diagnostic tests
# install.packages("strucchange") # For Rainbow Test

library(lmtest)
library(car)
library(strucchange)

# Simulated dataset
set.seed(123)
n <- 100
x1 <- rnorm(n, mean = 50, sd = 10)
x2 <- rnorm(n, mean = 30, sd = 5)
epsilon <- rnorm(n)
y <- 5 + 0.4 * x1 - 0.3 * x2 + epsilon

# Original regression model
model <- lm(y ~ x1 + x2)

# ----------------------------------------------------------------------
# 1. Ramsey RESET Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
reset_test <-
    resettest(model, power = 2:3, type = "fitted")  # Adds ŷ² and ŷ³
print(reset_test)
#> 
#>  RESET test
#> 
#> data:  model
#> RESET = 0.1921, df1 = 2, df2 = 95, p-value = 0.8255

# ----------------------------------------------------------------------
# 2. Harvey–Collier Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified (residuals have zero mean)
hc_test <- harvtest(model)
print(hc_test)
#> 
#>  Harvey-Collier test
#> 
#> data:  model
#> HC = 0.041264, df = 96, p-value = 0.9672

# ----------------------------------------------------------------------
# 3. Rainbow Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
rainbow_test <- lmtest::raintest (model)
print(rainbow_test)
#> 
#>  Rainbow test
#> 
#> data:  model
#> Rain = 1.1857, df1 = 50, df2 = 47, p-value = 0.279

Interpretation of the Results

  1. Ramsey RESET Test (Regression Equation Specification Error Test)

    • Null Hypothesis (\(H_0\)): The model is correctly specified.

    • Alternative Hypothesis (\(H_1\)): The model suffers from omitted variables, incorrect functional form, or other specification errors.

    • Decision Rule:

      • Reject \(H_0\) if p-value \(< 0.05\) → Evidence of model misspecification (e.g., missing nonlinear terms).
      • Fail to reject \(H_0\) if p-value \(\ge 0.05\) → No strong evidence of misspecification.
  2. Harvey–Collier Test

    • Null Hypothesis (\(H_0\)): The model is correctly specified (residuals are random noise with zero mean).

    • Alternative Hypothesis (\(H_1\)): The model is misspecified (residuals contain systematic patterns).

    • Decision Rule:

      • Reject \(H_0\) if p-value \(< 0.05\) → Model misspecification detected (non-random residual patterns).
      • Fail to reject \(H_0\) if p-value \(\ge 0.05\) → No evidence of misspecification.
  3. Rainbow Test

    • Null Hypothesis (\(H_0\)): The model is correctly specified.

    • Alternative Hypothesis (\(H_1\)): The model is misspecified.

    • Decision Rule:

      • Reject \(H_0\) if p-value \(< 0.05\) → Evidence of model misspecification (model performs differently on subsets).
      • Fail to reject \(H_0\) if p-value \(\ge 0.05\) → Model specification appears valid.

References

Harvey, Andrew C, and Patrick Collier. 1977. “Testing for Functional Misspecification in Regression Analysis.” Journal of Econometrics 6 (1): 103–19.
Ramsey, James Bernard. 1969. “Tests for Specification Errors in Classical Linear Least-Squares Regression Analysis.” Journal of the Royal Statistical Society Series B: Statistical Methodology 31 (2): 350–71.
Utts, Jessica M. 1982. “The Rainbow Test for Lack of Fit in Regression.” Communications in Statistics-Theory and Methods 11 (24): 2801–15.