6.4 Prediction and confidence intervals

The prediction of the conditional response \(m(\mathbf{x})=\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) with the local polynomial estimator reduces to evaluate \(\hat{m}(\mathbf{x};p,\mathbf{h}).\) The fitted values are, therefore, \(\hat{Y}_i:=\hat{m}(\mathbf{X}_i;p,\mathbf{h}),\) \(i=1,\ldots,n.\) The np package has methods to perform these operations via the predict and fitted functions.

More interesting is the discussion about the uncertainty of \(\hat{m}(\mathbf{x};p,\mathbf{h})\) and, as a consequence, of the predictions. Differently to what happened in parametric models, in nonparametric regression there is no parametric distribution of the response that can help to carry out the inference and, consequently, to address the uncertainty of the estimation. Because of this, it is required to resort to somehow convoluted asymptotic expressions230 (carried out by predict if se.fit = TRUE) or to bootstrap resampling procedures. The default bootstrap resampling procedure in np is the so-called wild bootstrap (Liu 1988), which is particularly well-suited for regression problems with potential heteroskedasticity.

Due to their increasing complexity, we just cover very superficially these methods with the following example.

# Asymptotic confidence bands for the marginal effects of each predictor on the
# response
par(mfrow = c(2, 3))
plot(fitOECD, plot.errors.method = "asymptotic", common.scale = FALSE,
     plot.par.mfrow = FALSE)


# Bootstrap confidence bands
# They take more time to compute because a resampling + refitting takes place
par(mfrow = c(2, 3))
plot(fitOECD, plot.errors.method = "bootstrap", plot.par.mfrow = FALSE)


# The asymptotic standard error associated to the regression evaluated at the
# evaluation points are in $merr
head(fitOECD$merr)
## [1] 0.004534787 0.006105144 0.001558307 0.002400982 0.006213965 0.005274139

# Recall that in $mean we had the regression evaluated at the evaluation points,
# by default the sample of the predictors, so in this case the same as the
# fitted values
head(fitOECD$mean)
## [1] 0.02877743 0.02113513 0.03592755 0.04027579 0.01099637 0.03888485

# Prediction for the first 3 points + standard errors
pred <- predict(fitOECD, newdata = oecdpanel[1:3, ], se.fit = TRUE)

# Approximate (based on assuming asymptotic normality) 100(1 - alpha)% CI for
# the conditional mean of the first 3 points
alpha <- 0.05
pred$fit + (qnorm(1 - alpha / 2) * pred$se.fit) %o% c(-1, 1)
##             [,1]       [,2]
## [1,] 0.019889408 0.03766545
## [2,] 0.009169271 0.03310100
## [3,] 0.032873327 0.03898178

Extrapolation with kernel regression estimators is particularly dangerous. Keep in mind that the kernel estimators smooth the data in order to estimate \(m,\) the trend. If there is no data close to the point \(\mathbf{x}\) at which \(m(\mathbf{x})\) is estimated, then the estimation will heavily depend on the closest data point to \(\mathbf{x}.\) If employing compactly-supported kernels (so that they can take the value \(0\) exactly), the estimate may not be even properly defined.

References

Liu, R. Y. 1988. “Bootstrap Procedures Under Some Non-i.i.d. Models.” The Annals of Statistics 16 (4): 1696–1708. https://doi.org/10.1214/aos/1176351062.

  1. Precisely, the ones associated to the asymptotic normality of \(\hat{m}(\mathbf{x};p,\mathbf{h})\) and that are based on the results of Theorem 6.1.↩︎