10.6 Confidence Intervals in Nonparametric Regression

Constructing confidence intervals (or bands) for nonparametric regression estimators like kernel smoothers, local polynomials, and smoothing splines is more complex than in parametric models. The key challenges arise from the flexible nature of the models and the dependence of bias and variance on the local data density and smoothing parameters.

10.6.1 Asymptotic Normality

Under regularity conditions, nonparametric estimators are asymptotically normal. For a given point x, we have:

nh{ˆm(x)m(x)}DN(0,σ2ν(x)),

where:

  • n is the sample size,
  • h is the bandwidth (for kernel or local polynomial estimators) or a function of λ (for smoothing splines),
  • σ2 is the variance of the errors,
  • ν(x) is a function that depends on the estimator, kernel, and local data density.

An approximate (1α) pointwise confidence interval for m(x) is given by:

ˆm(x)±zα/2^Var[ˆm(x)],

where:

  • zα/2 is the (1α/2) quantile of the standard normal distribution,
  • ^Var[ˆm(x)] is an estimate of the variance, which can be obtained using plug-in methods, sandwich estimators, or resampling techniques.

10.6.2 Bootstrap Methods

The bootstrap provides a powerful alternative for constructing confidence intervals and bands, particularly when asymptotic approximations are unreliable (e.g., small sample sizes or near boundaries).

10.6.2.1 Bootstrap Approaches

  • Residual Bootstrap:
    1. Fit the nonparametric model to obtain residuals ˆεi=yiˆm(xi).
    2. Generate bootstrap samples yi=ˆm(xi)+εi, where εi are resampled (with replacement) from {ˆεi}.
    3. Refit the model to each bootstrap sample to obtain ˆm(x).
    4. Repeat many times to build an empirical distribution of ˆm(x).
  • Wild Bootstrap:
    Particularly useful for heteroscedastic data. Instead of resampling residuals directly, we multiply them by random variables with mean zero and unit variance to preserve the variance structure.

10.6.2.2 Bootstrap Confidence Bands

While pointwise confidence intervals cover the true function at a specific x with probability (1α), simultaneous confidence bands cover the entire function over an interval with the desired confidence level. Bootstrap methods can be adapted to estimate these bands by capturing the distribution of the maximum deviation between ˆm(x) and m(x) over the range of x.

10.6.3 Practical Considerations

  • Bias Correction:
    Nonparametric estimators often have non-negligible bias, especially near boundaries. Bias correction techniques or undersmoothing (choosing a smaller bandwidth) are sometimes used to improve interval coverage.

  • Effective Degrees of Freedom:
    For smoothing splines, the effective degrees of freedom (related to tr(Sλ)) provide insight into model complexity and influence confidence interval construction.