14.3 Heteroskedasticity Tests

Heteroskedasticity occurs when the variance of the error terms (ϵi) in a regression model is not constant across observations. This violates the Classical OLS Assumption, specifically the assumption of homoskedasticity (Assumption A4 Homoskedasticity in the Gauss-Markov Theorem), which states:

Var(ϵi)=σ2i

When heteroskedasticity is present:

  • Ordinary Least Squares estimators remain unbiased but become inefficient (i.e., no longer Best Linear Unbiased Estimators—BLUE).

  • The standard errors of the estimates are biased, leading to unreliable hypothesis tests (e.g., t-tests and F-tests).

Detecting heteroskedasticity is crucial for ensuring the validity of regression results. This section covers key tests used to identify heteroskedasticity:


14.3.1 Breusch–Pagan Test

The Breusch–Pagan (BP) Test is one of the most widely used tests for detecting heteroskedasticity (Breusch and Pagan 1979). It examines whether the variance of the residuals depends on the independent variables.

Hypotheses

  • Null Hypothesis (H0): Homoskedasticity (Var(ϵi)=σ2 is constant).
  • Alternative Hypothesis (H1): Heteroskedasticity exists; the variance of ϵi depends on the independent variables.

Procedure

  1. Estimate the original regression model:

yi=β0+β1x1i+β2x2i++βkxki+ϵi

Obtain the residuals ˆϵi from this regression.

  1. Compute the squared residuals:

ˆϵ2i

  1. Auxiliary Regression: Regress the squared residuals on the independent variables:

ˆϵ2i=α0+α1x1i+α2x2i++αkxki+ui

  1. Calculate the Test Statistic:

The BP test statistic is:

BP=nR2aux

Where:

  • n is the sample size,

  • R2aux is the R2 from the auxiliary regression.

  1. Decision Rule:
  • Under H0, the BP statistic follows a chi-squared distribution with k degrees of freedom (where k is the number of independent variables):

BPχ2k

  • Reject H0 if the BP statistic exceeds the critical value from the chi-squared distribution.

Advantages and Limitations

  • Advantage: Simple to implement; directly tests the relationship between residual variance and regressors.
  • Limitation: Sensitive to non-normality; less effective when heteroskedasticity is not linearly related to independent variables.

14.3.2 White Test

The White Test is a more general heteroskedasticity test that does not require specifying the form of heteroskedasticity (White 1980). It can detect both linear and nonlinear forms.

Hypotheses

  • Null Hypothesis (H0): Homoskedasticity.
  • Alternative Hypothesis (H1): Heteroskedasticity (of any form).

Procedure

  1. Estimate the original regression model and obtain residuals ˆϵi.

  2. Auxiliary Regression: Regress the squared residuals on:

    • The original independent variables (x1i,x2i,,xki),
    • Their squares (x21i,x22i,,x2ki),
    • Their cross-products (e.g., x1ix2i).

    The auxiliary regression is:

    ˆϵ2i=α0+α1x1i+α2x2i++αkxki+αk+1x21i++α2kx2ki+α2k+1(x1ix2i)+ui

  3. Calculate the Test Statistic:

    White=nR2aux

  4. Decision Rule:

  • Under H0, the statistic follows a chi-squared distribution with degrees of freedom equal to the number of auxiliary regressors:

    Whiteχ2df

  • Reject H0 if the statistic exceeds the critical chi-squared value.


Advantages and Limitations

  • Advantage: Can detect a wide range of heteroskedasticity patterns.
  • Limitation: May suffer from overfitting in small samples due to many auxiliary regressors.

14.3.3 Goldfeld–Quandt Test

The Goldfeld–Quandt Test is a simple test that detects heteroskedasticity by comparing the variance of residuals in two different subsets of the data (Goldfeld and Quandt 1965).

Hypotheses

  • Null Hypothesis (H0): Homoskedasticity.
  • Alternative Hypothesis (H1): Heteroskedasticity; variances differ between groups.

Procedure

  1. Sort the data based on an independent variable suspected to cause heteroskedasticity.

  2. Split the data into three groups:

    • Group 1: Lower values,
    • Group 2: Middle values (often omitted),
    • Group 3: Higher values.
  3. Estimate the regression model separately for Groups 1 and 3. Obtain the residual sum of squares (SSR1 and SSR2).

  4. Calculate the Test Statistic:

    F=SSR2/(n2k)SSR1/(n1k)

    Where:

    • n1 and n2 are the number of observations in Groups 1 and 3, respectively,
    • k is the number of estimated parameters.
  5. Decision Rule:

  • Under H0, the test statistic follows an F-distribution with (n2k,n1k) degrees of freedom:

    FF(n2k,n1k)

  • Reject H0 if F exceeds the critical value.


Advantages and Limitations

  • Advantage: Simple to apply when heteroskedasticity is suspected to vary systematically with an independent variable.
  • Limitation: Requires arbitrary splitting of data and assumes the error variance changes abruptly between groups.

14.3.4 Park Test

The Park Test identifies heteroskedasticity by modeling the error variance as a function of an independent variable (R. E. Park 1966).

Hypotheses

  • Null Hypothesis (H0): Homoskedasticity.
  • Alternative Hypothesis (H1): Heteroskedasticity; variance depends on an independent variable.

Procedure

  1. Estimate the original regression and obtain residuals ˆϵi.

  2. Transform the residuals: Take the natural logarithm of the squared residuals:

    ln(ˆϵ2i)

  3. Auxiliary Regression: Regress ln(ˆϵ2i) on the independent variable(s):

    ln(ˆϵ2i)=α0+α1ln(xi)+ui

  4. Decision Rule:

  • Test whether α1=0 using a t-test.
  • Reject H0 if α1 is statistically significant, indicating heteroskedasticity.

Advantages and Limitations

  • Advantage: Simple to implement; works well when the variance follows a log-linear relationship.
  • Limitation: Assumes a specific functional form for the variance, which may not hold in practice.

14.3.5 Glejser Test

The Glejser Test detects heteroskedasticity by regressing the absolute value of residuals on the independent variables (Glejser 1969).

Hypotheses

  • Null Hypothesis (H0): Homoskedasticity.
  • Alternative Hypothesis (H1): Heteroskedasticity exists.

Procedure

  1. Estimate the original regression and obtain residuals ˆϵi.

  2. Auxiliary Regression: Regress the absolute residuals on the independent variables:

    |ˆϵi|=α0+α1x1i+α2x2i++αkxki+ui

  3. Decision Rule:

  • Test the significance of the coefficients (α1,α2,) using t-tests.
  • Reject H0 if any coefficient is statistically significant, indicating heteroskedasticity.

Advantages and Limitations

  • Advantage: Flexible; can detect various forms of heteroskedasticity.
  • Limitation: Sensitive to outliers since it relies on absolute residuals.

14.3.6 Summary of Heteroskedasticity Tests

Test Type Assumptions Key Statistic When to Use
Breusch–Pagan Parametric Linear relationship with predictors χ2 General-purpose test
White General (non-parametric) No functional form assumption χ2 Detects both linear & nonlinear forms
Goldfeld–Quandt Group comparison Assumes known ordering of variance F-distribution When heteroskedasticity varies by groups
Park Parametric (log-linear) Assumes log-linear variance t-test When variance depends on predictors
Glejser Parametric Based on absolute residuals t-test Simple test for variance dependence

Detecting heteroskedasticity is critical for ensuring the reliability of regression models. While each test has strengths and limitations, combining multiple tests can provide robust insights. Once heteroskedasticity is detected, consider using robust standard errors or alternative estimation techniques (e.g., Generalized Least Squares or Weighted Least Squares) to address the issue.

# Install and load necessary libraries
# install.packages("lmtest")      # For Breusch–Pagan Test
# install.packages("car")         # For additional regression diagnostics
# install.packages("sandwich")    # For robust covariance estimation


library(lmtest)
library(car)
library(sandwich)

# Simulated dataset
set.seed(123)
n <- 100
x1 <- rnorm(n, mean = 50, sd = 10)
x2 <- rnorm(n, mean = 30, sd = 5)
epsilon <-
    rnorm(n, sd = x1 * 0.1)  # Heteroskedastic errors increasing with x1
y <- 5 + 0.4 * x1 - 0.3 * x2 + epsilon

# Original regression model
model <- lm(y ~ x1 + x2)

# ----------------------------------------------------------------------
# 1. Breusch–Pagan Test
# ----------------------------------------------------------------------
# Null Hypothesis: Homoskedasticity
bp_test <- bptest(model)
print(bp_test)
#> 
#>  studentized Breusch-Pagan test
#> 
#> data:  model
#> BP = 7.8141, df = 2, p-value = 0.0201

# ----------------------------------------------------------------------
# 2. White Test (using Breusch–Pagan framework with squares & interactions)
# ----------------------------------------------------------------------
# Create squared and interaction terms
model_white <-
    lm(residuals(model) ^ 2 ~ x1 + x2 + I(x1 ^ 2) + I(x2 ^ 2) + I(x1 * x2))
white_statistic <-
    summary(model_white)$r.squared * n  # White Test Statistic
df_white <-
    length(coef(model_white)) - 1              # Degrees of freedom
p_value_white <- 1 - pchisq(white_statistic, df_white)

# Display White Test result
cat("White Test Statistic:", white_statistic, "\n")
#> White Test Statistic: 11.85132
cat("Degrees of Freedom:", df_white, "\n")
#> Degrees of Freedom: 5
cat("P-value:", p_value_white, "\n")
#> P-value: 0.0368828

# ----------------------------------------------------------------------
# 3. Goldfeld–Quandt Test
# ----------------------------------------------------------------------
# Null Hypothesis: Homoskedasticity
# Sort data by x1 (suspected source of heteroskedasticity)
gq_test <-
    gqtest(model, order.by = ~ x1, fraction = 0.2)  # Omit middle 20% of data
print(gq_test)
#> 
#>  Goldfeld-Quandt test
#> 
#> data:  model
#> GQ = 1.8352, df1 = 37, df2 = 37, p-value = 0.03434
#> alternative hypothesis: variance increases from segment 1 to 2

# ----------------------------------------------------------------------
# 4. Park Test
# ----------------------------------------------------------------------
# Step 1: Get residuals and square them
residuals_squared <- residuals(model) ^ 2

# Step 2: Log-transform squared residuals
log_residuals_squared <- log(residuals_squared)

# Step 3: Regress log(residuals^2) on log(x1) (assuming variance depends on x1)
park_test <- lm(log_residuals_squared ~ log(x1))
summary(park_test)
#> 
#> Call:
#> lm(formula = log_residuals_squared ~ log(x1))
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -9.3633 -1.3424  0.4218  1.6089  3.0697 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  -1.6319     4.5982  -0.355    0.723
#> log(x1)       0.8903     1.1737   0.759    0.450
#> 
#> Residual standard error: 2.171 on 98 degrees of freedom
#> Multiple R-squared:  0.005837,   Adjusted R-squared:  -0.004308 
#> F-statistic: 0.5754 on 1 and 98 DF,  p-value: 0.4499

# ----------------------------------------------------------------------
# 5. Glejser Test
# ----------------------------------------------------------------------
# Step 1: Absolute value of residuals
abs_residuals <- abs(residuals(model))

# Step 2: Regress absolute residuals on independent variables
glejser_test <- lm(abs_residuals ~ x1 + x2)
summary(glejser_test)
#> 
#> Call:
#> lm(formula = abs_residuals ~ x1 + x2)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.3096 -2.2680 -0.4564  1.9554  8.3921 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  0.755846   2.554842   0.296   0.7680  
#> x1           0.064896   0.032852   1.975   0.0511 .
#> x2          -0.008495   0.062023  -0.137   0.8913  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.98 on 97 degrees of freedom
#> Multiple R-squared:  0.0392, Adjusted R-squared:  0.01939 
#> F-statistic: 1.979 on 2 and 97 DF,  p-value: 0.1438

Interpretation of the Results

  1. Breusch–Pagan Test

    • Null Hypothesis (H0): Homoskedasticity (constant error variance).

    • Alternative Hypothesis (H1): Heteroskedasticity exists (error variance depends on predictors).

    • Decision Rule:

      • Reject H0 if p-value <0.05 → Evidence of heteroskedasticity.
      • Fail to reject H0 if p-value 0.05 → No strong evidence of heteroskedasticity.
  2. White Test

    • Null Hypothesis (H0): Homoskedasticity.

    • Alternative Hypothesis (H1): Heteroskedasticity (of any form, linear or nonlinear).

    • Decision Rule:

      • Reject H0 if p-value <0.05 → Presence of heteroskedasticity.
      • Fail to reject H0 if p-value 0.05 → Homoskedasticity likely holds.
  3. Goldfeld–Quandt Test

    • Null Hypothesis (H0): Homoskedasticity (equal variances across groups).

    • Alternative Hypothesis (H1): Heteroskedasticity (unequal variances between groups).

    • Decision Rule:

      • Reject H0 if p-value <0.05 → Variances differ between groups, indicating heteroskedasticity.
      • Fail to reject H0 if p-value 0.05 → No significant evidence of heteroskedasticity.
  4. Park Test

    • Null Hypothesis (H0): No relationship between the variance of errors and predictor(s) (homoskedasticity).

    • Alternative Hypothesis (H1): Variance of errors depends on predictor(s).

    • Decision Rule:

      • Reject H0 if the coefficient of log(x1) is statistically significant (p-value <0.05).
      • Fail to reject H0 if p-value 0.05.
  5. Glejser Test

    • Null Hypothesis (H0): Homoskedasticity (no relationship between absolute residuals and predictors).

    • Alternative Hypothesis (H1): Heteroskedasticity exists.

    • Decision Rule:

      • Reject H0 if any predictor is statistically significant (p-value <0.05).
      • Fail to reject H0 if p-value 0.05.

References

Breusch, Trevor S, and Adrian R Pagan. 1979. “A Simple Test for Heteroscedasticity and Random Coefficient Variation.” Econometrica: Journal of the Econometric Society, 1287–94.
Glejser, Herbert. 1969. “A New Test for Heteroskedasticity.” Journal of the American Statistical Association 64 (325): 316–23.
Goldfeld, Stephen M, and Richard E Quandt. 1965. “Some Tests for Homoscedasticity.” Journal of the American Statistical Association 60 (310): 539–47.
Park, Rolla E. 1966. “Estimation with Heteroscedastic Error Terms.” Econometrica 34 (4).
White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica: Journal of the Econometric Society, 817–38.