14.4 Functional Form Tests
Functional form misspecification occurs when the chosen regression model does not correctly represent the true relationship between the dependent and independent variables. This can happen due to:
- Omitted variables (important predictors not included),
- Incorrect transformations of variables (e.g., missing nonlinear relationships),
- Incorrect interaction terms (missing interaction effects between variables),
- Inappropriate linearity assumptions.
Functional form errors can lead to biased and inconsistent estimators, undermining the validity of statistical inferences. To detect such issues, several diagnostic tests are available.
Key Functional Form Tests:
Each test focuses on identifying different aspects of potential model misspecification.
14.4.1 Ramsey RESET Test (Regression Equation Specification Error Test)
The Ramsey RESET Test is one of the most widely used tests to detect functional form misspecification (Ramsey 1969). It examines whether adding nonlinear transformations of the fitted values (or regressors) improves the model fit.
Hypotheses
- Null Hypothesis (\(H_0\)): The model is correctly specified.
- Alternative Hypothesis (\(H_1\)): The model suffers from omitted variables, incorrect functional form, or other specification errors.
Procedure
Estimate the original regression model:
\[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \beta_k x_{ki} + \epsilon_i \]
Obtain the fitted values:
\[ \hat{y}_i \]
Augment the model with powers of the fitted values (squared, cubed, etc.):
\[ y_i = \beta_0 + \beta_1 x_{1i} + \dots + \beta_k x_{ki} + \gamma_1 \hat{y}_i^2 + \gamma_2 \hat{y}_i^3 + u_i \]
Test the joint significance of the added terms:
\[ H_0: \gamma_1 = \gamma_2 = 0 \]
Compute the F-statistic:
\[ F = \frac{(SSR_{\text{restricted}} - SSR_{\text{unrestricted}}) / q}{SSR_{\text{unrestricted}} / (n - k - q - 1)} \]
Where:
- \(SSR_{\text{restricted}}\) = Sum of Squared Residuals from the original model,
- \(SSR_{\text{unrestricted}}\) = SSR from the augmented model,
- \(q\) = Number of additional terms (e.g., 2 if adding \(\hat{y}^2\) and \(\hat{y}^3\)),
- \(n\) = Sample size,
- \(k\) = Number of predictors in the original model.
Decision Rule
- Under \(H_0\), the F-statistic follows an \(F\)-distribution with \((q, n - k - q - 1)\) degrees of freedom.
- Reject \(H_0\) if the F-statistic exceeds the critical value, indicating functional form misspecification.
Advantages and Limitations
- Advantage: Simple to implement; detects omitted variables and incorrect functional forms.
- Limitation: Does not identify which variable or functional form is incorrect—only indicates the presence of an issue.
14.4.2 Harvey–Collier Test
The Harvey–Collier Test evaluates whether the model’s residuals display systematic patterns, which would indicate functional form misspecification (Harvey and Collier 1977). It is based on testing for a non-zero mean in the residuals after projection onto specific components.
Hypotheses
- Null Hypothesis (\(H_0\)): The model is correctly specified (residuals are random noise with zero mean).
- Alternative Hypothesis (\(H_1\)): The model is misspecified (residuals contain systematic patterns).
Procedure
Estimate the original regression model and obtain residuals \(\hat{\epsilon}_i\).
Project the residuals onto the space spanned by a specially constructed test vector (often derived from the inverse of the design matrix in linear regression).
Calculate the Harvey–Collier statistic:
\[ t = \frac{\bar{\epsilon}}{\text{SE}(\bar{\epsilon})} \]
Where:
- \(\bar{\epsilon}\) is the mean of the projected residuals,
- \(\text{SE}(\bar{\epsilon})\) is the standard error of the mean residual.
Decision Rule:
- The test statistic follows a \(t\)-distribution under \(H_0\).
- Reject \(H_0\) if the \(t\)-statistic is significantly different from zero.
Advantages and Limitations
- Advantage: Simple to apply and interpret; good for detecting subtle misspecifications.
- Limitation: Sensitive to outliers; may have reduced power in small samples.
14.4.3 Rainbow Test
The Rainbow Test is a general-purpose diagnostic tool for functional form misspecification (Utts 1982). It compares the performance of the model on the full sample versus a central subsample, where the central subsample contains observations near the median of the independent variables.
Hypotheses
- Null Hypothesis (\(H_0\)): The model is correctly specified.
- Alternative Hypothesis (\(H_1\)): The model is misspecified.
Procedure
Estimate the regression model on the full dataset and record the residuals.
Identify a central subsample (e.g., observations near the median of key predictors).
Estimate the model again on the central subsample.
Compare the predictive accuracy between the full sample and subsample using an F-statistic:
\[ F = \frac{(SSR_{\text{full}} - SSR_{\text{subsample}}) / q}{SSR_{\text{subsample}} / (n - k - q)} \]
Where \(q\) is the number of restrictions implied by using the subsample.
Decision Rule
- Under \(H_0\), the test statistic follows an \(F\)-distribution.
- Reject \(H_0\) if the F-statistic is significant, indicating model misspecification.
Advantages and Limitations
- Advantage: Robust to various forms of misspecification.
- Limitation: Choice of subsample may influence results; less informative about the specific nature of the misspecification.
14.4.4 Summary of Functional Form Tests
Test | Type | Key Statistic | Purpose | When to Use |
---|---|---|---|---|
Ramsey RESET Test | Augmented regression | \(F\)-test | Detects omitted variables, nonlinearities | General model specification testing |
Harvey–Collier Test | Residual-based | \(t\)-test | Detects systematic patterns in residuals | Subtle misspecifications in linear models |
Rainbow Test | Subsample comparison | \(F\)-test | Tests model stability across subsamples | Comparing central vs. full sample |
Functional form misspecification can severely distort regression results, leading to biased estimates and invalid inferences. While no single test can detect all types of misspecification, using a combination of tests provides a robust framework for model diagnostics.
# Install and load necessary libraries
# install.packages("lmtest") # For RESET and Harvey–Collier Test
# install.packages("car") # For diagnostic tests
# install.packages("strucchange") # For Rainbow Test
library(lmtest)
library(car)
library(strucchange)
# Simulated dataset
set.seed(123)
n <- 100
x1 <- rnorm(n, mean = 50, sd = 10)
x2 <- rnorm(n, mean = 30, sd = 5)
epsilon <- rnorm(n)
y <- 5 + 0.4 * x1 - 0.3 * x2 + epsilon
# Original regression model
model <- lm(y ~ x1 + x2)
# ----------------------------------------------------------------------
# 1. Ramsey RESET Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
reset_test <-
resettest(model, power = 2:3, type = "fitted") # Adds ŷ² and ŷ³
print(reset_test)
#>
#> RESET test
#>
#> data: model
#> RESET = 0.1921, df1 = 2, df2 = 95, p-value = 0.8255
# ----------------------------------------------------------------------
# 2. Harvey–Collier Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified (residuals have zero mean)
hc_test <- harvtest(model)
print(hc_test)
#>
#> Harvey-Collier test
#>
#> data: model
#> HC = 0.041264, df = 96, p-value = 0.9672
# ----------------------------------------------------------------------
# 3. Rainbow Test
# ----------------------------------------------------------------------
# Null Hypothesis: The model is correctly specified
rainbow_test <- lmtest::raintest (model)
print(rainbow_test)
#>
#> Rainbow test
#>
#> data: model
#> Rain = 1.1857, df1 = 50, df2 = 47, p-value = 0.279
Interpretation of the Results
Ramsey RESET Test (Regression Equation Specification Error Test)
Null Hypothesis (\(H_0\)): The model is correctly specified.
Alternative Hypothesis (\(H_1\)): The model suffers from omitted variables, incorrect functional form, or other specification errors.
Decision Rule:
- Reject \(H_0\) if p-value \(< 0.05\) → Evidence of model misspecification (e.g., missing nonlinear terms).
- Fail to reject \(H_0\) if p-value \(\ge 0.05\) → No strong evidence of misspecification.
Harvey–Collier Test
Null Hypothesis (\(H_0\)): The model is correctly specified (residuals are random noise with zero mean).
Alternative Hypothesis (\(H_1\)): The model is misspecified (residuals contain systematic patterns).
Decision Rule:
- Reject \(H_0\) if p-value \(< 0.05\) → Model misspecification detected (non-random residual patterns).
- Fail to reject \(H_0\) if p-value \(\ge 0.05\) → No evidence of misspecification.
Rainbow Test
Null Hypothesis (\(H_0\)): The model is correctly specified.
Alternative Hypothesis (\(H_1\)): The model is misspecified.
Decision Rule:
- Reject \(H_0\) if p-value \(< 0.05\) → Evidence of model misspecification (model performs differently on subsets).
- Fail to reject \(H_0\) if p-value \(\ge 0.05\) → Model specification appears valid.