14.3 Heteroskedasticity Tests

Heteroskedasticity occurs when the variance of the error terms ( $\epsilon_i$ ) in a regression model is not constant across observations. This violates the Classical OLS Assumption, specifically the assumption of homoskedasticity (Assumption A4 Homoskedasticity in the Gauss-Markov Theorem), which states:

$\text{Var}(\epsilon_i) = \sigma^2 \quad \forall \, i$

When heteroskedasticity is present:

Ordinary Least Squares estimators remain unbiased but become inefficient (i.e., no longer Best Linear Unbiased Estimators—BLUE).
The standard errors of the estimates are biased, leading to unreliable hypothesis tests (e.g., $t$ -tests and $F$ -tests).

Detecting heteroskedasticity is crucial for ensuring the validity of regression results. This section covers key tests used to identify heteroskedasticity:

14.3.1 Breusch–Pagan Test

The Breusch–Pagan (BP) Test is one of the most widely used tests for detecting heteroskedasticity (Breusch and Pagan 1979). It examines whether the variance of the residuals depends on the independent variables.

Hypotheses

Null Hypothesis ( $H_0$ ): Homoskedasticity ( $\text{Var}(\epsilon_i) = \sigma^2$ is constant).
Alternative Hypothesis ( $H_1$ ): Heteroskedasticity exists; the variance of $\epsilon_i$ depends on the independent variables.

Procedure

Estimate the original regression model:

$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \beta_k x_{ki} + \epsilon_i$

Obtain the residuals $\hat{\epsilon}_i$ from this regression.

Compute the squared residuals:

$\hat{\epsilon}_i^2$

Auxiliary Regression: Regress the squared residuals on the independent variables:

$\hat{\epsilon}_i^2 = \alpha_0 + \alpha_1 x_{1i} + \alpha_2 x_{2i} + \dots + \alpha_k x_{ki} + u_i$

Calculate the Test Statistic:

The BP test statistic is:

$\text{BP} = n \cdot R^2_{\text{aux}}$

Where:

$n$ is the sample size,
$R^2_{\text{aux}}$ is the $R^2$ from the auxiliary regression.

Decision Rule:

Under $H_0$ , the BP statistic follows a chi-squared distribution with $k$ degrees of freedom (where $k$ is the number of independent variables):

$\text{BP} \sim \chi^2_k$

Reject $H_0$ if the BP statistic exceeds the critical value from the chi-squared distribution.

Advantages and Limitations

Advantage: Simple to implement; directly tests the relationship between residual variance and regressors.
Limitation: Sensitive to non-normality; less effective when heteroskedasticity is not linearly related to independent variables.

14.3.2 White Test

The White Test is a more general heteroskedasticity test that does not require specifying the form of heteroskedasticity (White 1980). It can detect both linear and nonlinear forms.

Hypotheses

Null Hypothesis ( $H_0$ ): Homoskedasticity.
Alternative Hypothesis ( $H_1$ ): Heteroskedasticity (of any form).

Procedure

Estimate the original regression model and obtain residuals $\hat{\epsilon}_i$ .
Auxiliary Regression: Regress the squared residuals on:
- The original independent variables ( $x_{1i}, x_{2i}, \dots, x_{ki}$ ),
- Their squares ( $x_{1i}^2, x_{2i}^2, \dots, x_{ki}^2$ ),
- Their cross-products (e.g., $x_{1i} x_{2i}$ ).
The auxiliary regression is:

$\hat{\epsilon}_i^2 = \alpha_0 + \alpha_1 x_{1i} + \alpha_2 x_{2i} + \dots + \alpha_k x_{ki} + \alpha_{k+1} x_{1i}^2 + \dots + \alpha_{2k} x_{ki}^2 + \alpha_{2k+1} (x_{1i} x_{2i}) + u_i$
Calculate the Test Statistic:

$\text{White} = n \cdot R^2_{\text{aux}}$
Decision Rule:

Under $H_0$ , the statistic follows a chi-squared distribution with degrees of freedom equal to the number of auxiliary regressors:

$\text{White} \sim \chi^2_{\text{df}}$
Reject $H_0$ if the statistic exceeds the critical chi-squared value.

Advantages and Limitations

Advantage: Can detect a wide range of heteroskedasticity patterns.
Limitation: May suffer from overfitting in small samples due to many auxiliary regressors.

14.3.3 Goldfeld–Quandt Test

The Goldfeld–Quandt Test is a simple test that detects heteroskedasticity by comparing the variance of residuals in two different subsets of the data (Goldfeld and Quandt 1965).

Hypotheses

Null Hypothesis ( $H_0$ ): Homoskedasticity.
Alternative Hypothesis ( $H_1$ ): Heteroskedasticity; variances differ between groups.

Procedure

Sort the data based on an independent variable suspected to cause heteroskedasticity.
Split the data into three groups:
- Group 1: Lower values,
- Group 2: Middle values (often omitted),
- Group 3: Higher values.
Estimate the regression model separately for Groups 1 and 3. Obtain the residual sum of squares ( $SSR_1$ and $SSR_2$ ).
Calculate the Test Statistic:

$F = \frac{SSR_2 / (n_2 - k)}{SSR_1 / (n_1 - k)}$

Where:
- $n_1$ and $n_2$ are the number of observations in Groups 1 and 3, respectively,
- $k$ is the number of estimated parameters.
Decision Rule:

Under $H_0$ , the test statistic follows an $F$ -distribution with $(n_2 - k, n_1 - k)$ degrees of freedom:

$F \sim F_{(n_2 - k, n_1 - k)}$
Reject $H_0$ if $F$ exceeds the critical value.

Advantages and Limitations

Advantage: Simple to apply when heteroskedasticity is suspected to vary systematically with an independent variable.
Limitation: Requires arbitrary splitting of data and assumes the error variance changes abruptly between groups.

14.3.4 Park Test

The Park Test identifies heteroskedasticity by modeling the error variance as a function of an independent variable (Park 1966).

Hypotheses

Null Hypothesis ( $H_0$ ): Homoskedasticity.
Alternative Hypothesis ( $H_1$ ): Heteroskedasticity; variance depends on an independent variable.

Procedure

Estimate the original regression and obtain residuals $\hat{\epsilon}_i$ .
Transform the residuals: Take the natural logarithm of the squared residuals:

$\ln(\hat{\epsilon}_i^2)$
Auxiliary Regression: Regress $\ln(\hat{\epsilon}_i^2)$ on the independent variable(s):

$\ln(\hat{\epsilon}_i^2) = \alpha_0 + \alpha_1 \ln(x_i) + u_i$
Decision Rule:

Test whether $\alpha_1 = 0$ using a $t$ -test.
Reject $H_0$ if $\alpha_1$ is statistically significant, indicating heteroskedasticity.

Advantages and Limitations

Advantage: Simple to implement; works well when the variance follows a log-linear relationship.
Limitation: Assumes a specific functional form for the variance, which may not hold in practice.

14.3.5 Glejser Test

The Glejser Test detects heteroskedasticity by regressing the absolute value of residuals on the independent variables (Glejser 1969).

Hypotheses

Null Hypothesis ( $H_0$ ): Homoskedasticity.
Alternative Hypothesis ( $H_1$ ): Heteroskedasticity exists.

Procedure

Estimate the original regression and obtain residuals $\hat{\epsilon}_i$ .
Auxiliary Regression: Regress the absolute residuals on the independent variables:

$|\hat{\epsilon}_i| = \alpha_0 + \alpha_1 x_{1i} + \alpha_2 x_{2i} + \dots + \alpha_k x_{ki} + u_i$
Decision Rule:

Test the significance of the coefficients ( $\alpha_1, \alpha_2, \dots$ ) using $t$ -tests.
Reject $H_0$ if any coefficient is statistically significant, indicating heteroskedasticity.

Advantages and Limitations

Advantage: Flexible; can detect various forms of heteroskedasticity.
Limitation: Sensitive to outliers since it relies on absolute residuals.

14.3.6 Summary of Heteroskedasticity Tests

Summary of Heteroskedasticity Tests
Test	Type	Assumptions	Key Statistic	When to Use
Breusch–Pagan	Parametric	Linear relationship with predictors	$\chi^2$	General-purpose test
White	General (non-parametric)	No functional form assumption	$\chi^2$	Detects both linear & nonlinear forms
Goldfeld–Quandt	Group comparison	Assumes known ordering of variance	$F$ -distribution	When heteroskedasticity varies by groups
Park	Parametric (log-linear)	Assumes log-linear variance	$t$ -test	When variance depends on predictors
Glejser	Parametric	Based on absolute residuals	$t$ -test	Simple test for variance dependence

Detecting heteroskedasticity is critical for ensuring the reliability of regression models. While each test has strengths and limitations, combining multiple tests can provide robust insights. Once heteroskedasticity is detected, consider using robust standard errors or alternative estimation techniques (e.g., Generalized Least Squares or Weighted Least Squares) to address the issue.

# Install and load necessary libraries
# install.packages("lmtest")      # For Breusch–Pagan Test
# install.packages("car")         # For additional regression diagnostics
# install.packages("sandwich")    # For robust covariance estimation


library(lmtest)
library(car)
library(sandwich)

# Simulated dataset
set.seed(123)
n <- 100
x1 <- rnorm(n, mean = 50, sd = 10)
x2 <- rnorm(n, mean = 30, sd = 5)
epsilon <-
    rnorm(n, sd = x1 * 0.1)  # Heteroskedastic errors increasing with x1
y <- 5 + 0.4 * x1 - 0.3 * x2 + epsilon

# Original regression model
model <- lm(y ~ x1 + x2)

# ----------------------------------------------------------------------
# 1. Breusch–Pagan Test
# ----------------------------------------------------------------------
# Null Hypothesis: Homoskedasticity
bp_test <- bptest(model)
print(bp_test)
#> 
#>  studentized Breusch-Pagan test
#> 
#> data:  model
#> BP = 7.8141, df = 2, p-value = 0.0201

# ----------------------------------------------------------------------
# 2. White Test (using Breusch–Pagan framework with squares & interactions)
# ----------------------------------------------------------------------
# Create squared and interaction terms
model_white <-
    lm(residuals(model) ^ 2 ~ x1 + x2 + I(x1 ^ 2) + I(x2 ^ 2) + I(x1 * x2))
white_statistic <-
    summary(model_white)$r.squared * n  # White Test Statistic
df_white <-
    length(coef(model_white)) - 1              # Degrees of freedom
p_value_white <- 1 - pchisq(white_statistic, df_white)

# Display White Test result
cat("White Test Statistic:", white_statistic, "\n")
#> White Test Statistic: 11.85132
cat("Degrees of Freedom:", df_white, "\n")
#> Degrees of Freedom: 5
cat("P-value:", p_value_white, "\n")
#> P-value: 0.0368828

# ----------------------------------------------------------------------
# 3. Goldfeld–Quandt Test
# ----------------------------------------------------------------------
# Null Hypothesis: Homoskedasticity
# Sort data by x1 (suspected source of heteroskedasticity)
gq_test <-
    gqtest(model, order.by = ~ x1, fraction = 0.2)  # Omit middle 20% of data
print(gq_test)
#> 
#>  Goldfeld-Quandt test
#> 
#> data:  model
#> GQ = 1.8352, df1 = 37, df2 = 37, p-value = 0.03434
#> alternative hypothesis: variance increases from segment 1 to 2

# ----------------------------------------------------------------------
# 4. Park Test
# ----------------------------------------------------------------------
# Step 1: Get residuals and square them
residuals_squared <- residuals(model) ^ 2

# Step 2: Log-transform squared residuals
log_residuals_squared <- log(residuals_squared)

# Step 3: Regress log(residuals^2) on log(x1) (assuming variance depends on x1)
park_test <- lm(log_residuals_squared ~ log(x1))
summary(park_test)
#> 
#> Call:
#> lm(formula = log_residuals_squared ~ log(x1))
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -9.3633 -1.3424  0.4218  1.6089  3.0697 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  -1.6319     4.5982  -0.355    0.723
#> log(x1)       0.8903     1.1737   0.759    0.450
#> 
#> Residual standard error: 2.171 on 98 degrees of freedom
#> Multiple R-squared:  0.005837,   Adjusted R-squared:  -0.004308 
#> F-statistic: 0.5754 on 1 and 98 DF,  p-value: 0.4499

# ----------------------------------------------------------------------
# 5. Glejser Test
# ----------------------------------------------------------------------
# Step 1: Absolute value of residuals
abs_residuals <- abs(residuals(model))

# Step 2: Regress absolute residuals on independent variables
glejser_test <- lm(abs_residuals ~ x1 + x2)
summary(glejser_test)
#> 
#> Call:
#> lm(formula = abs_residuals ~ x1 + x2)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.3096 -2.2680 -0.4564  1.9554  8.3921 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  0.755846   2.554842   0.296   0.7680  
#> x1           0.064896   0.032852   1.975   0.0511 .
#> x2          -0.008495   0.062023  -0.137   0.8913  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.98 on 97 degrees of freedom
#> Multiple R-squared:  0.0392, Adjusted R-squared:  0.01939 
#> F-statistic: 1.979 on 2 and 97 DF,  p-value: 0.1438

Interpretation of the Results

Breusch–Pagan Test
- Null Hypothesis ( $H_0$ ): Homoskedasticity (constant error variance).
- Alternative Hypothesis ( $H_1$ ): Heteroskedasticity exists (error variance depends on predictors).
- Decision Rule:
  - Reject $H_0$ if p-value $< 0.05$ → Evidence of heteroskedasticity.
  - Fail to reject $H_0$ if p-value $\ge 0.05$ → No strong evidence of heteroskedasticity.
White Test
- Null Hypothesis ( $H_0$ ): Homoskedasticity.
- Alternative Hypothesis ( $H_1$ ): Heteroskedasticity (of any form, linear or nonlinear).
- Decision Rule:
  - Reject $H_0$ if p-value $< 0.05$ → Presence of heteroskedasticity.
  - Fail to reject $H_0$ if p-value $\ge 0.05$ → Homoskedasticity likely holds.
Goldfeld–Quandt Test
- Null Hypothesis ( $H_0$ ): Homoskedasticity (equal variances across groups).
- Alternative Hypothesis ( $H_1$ ): Heteroskedasticity (unequal variances between groups).
- Decision Rule:
  - Reject $H_0$ if p-value $< 0.05$ → Variances differ between groups, indicating heteroskedasticity.
  - Fail to reject $H_0$ if p-value $\ge 0.05$ → No significant evidence of heteroskedasticity.
Park Test
- Null Hypothesis ( $H_0$ ): No relationship between the variance of errors and predictor(s) (homoskedasticity).
- Alternative Hypothesis ( $H_1$ ): Variance of errors depends on predictor(s).
- Decision Rule:
  - Reject $H_0$ if the coefficient of $\log(x_1)$ is statistically significant (p-value $< 0.05$ ).
  - Fail to reject $H_0$ if p-value $\ge 0.05$ .
Glejser Test
- Null Hypothesis ( $H_0$ ): Homoskedasticity (no relationship between absolute residuals and predictors).
- Alternative Hypothesis ( $H_1$ ): Heteroskedasticity exists.
- Decision Rule:
  - Reject $H_0$ if any predictor is statistically significant (p-value $< 0.05$ ).
  - Fail to reject $H_0$ if p-value $\ge 0.05$ .

References

Breusch, Trevor S, and Adrian R Pagan. 1979. “A Simple Test for Heteroscedasticity and Random Coefficient Variation.” Econometrica: Journal of the Econometric Society, 1287–94.

Glejser, Herbert. 1969. “A New Test for Heteroskedasticity.” Journal of the American Statistical Association 64 (325): 316–23.

Goldfeld, Stephen M, and Richard E Quandt. 1965. “Some Tests for Homoscedasticity.” Journal of the American Statistical Association 60 (310): 539–47.

Park, Rolla E. 1966. “Estimation with Heteroscedastic Error Terms.” Econometrica 34 (4).

White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica: Journal of the Econometric Society, 817–38.