10.2 Basic Concepts in Nonparametric Estimation

10.2.1 Bias-Variance Trade-Off

For a given method of estimating $m(x)$ , we denote the estimator as $\hat{m}(x)$ . The mean squared error (MSE) at a point $x$ is defined as:

$\operatorname{MSE}(x) = \mathbb{E}\left[\{\hat{m}(x) - m(x)\}^2\right].$

This MSE can be decomposed into two key components: bias and variance:

$\operatorname{MSE}(x) = \left[\mathbb{E}[\hat{m}(x)] - m(x)\right]^2 + \operatorname{Var}(\hat{m}(x)).$

Where:

Bias: Measures the systematic error in the estimator: $\operatorname{Bias}^2 = \left[\mathbb{E}[\hat{m}(x)] - m(x)\right]^2.$
Variance: Measures the variability of the estimator around its expected value: $\operatorname{Var}(\hat{m}(x)) = \mathbb{E}\left[\{\hat{m}(x) - \mathbb{E}[\hat{m}(x)]\}^2\right].$

Nonparametric methods often have low bias because they can adapt to a wide range of functions. However, this flexibility can lead to high variance, especially when the model captures noise rather than the underlying signal.

The bandwidth or smoothing parameter in nonparametric methods typically controls this trade-off:

Large bandwidth $\Rightarrow$ smoother function $\Rightarrow$ higher bias, lower variance.
Small bandwidth $\Rightarrow$ more wiggly function $\Rightarrow$ lower bias, higher variance.

Selecting an optimal bandwidth is critical, as it determines the balance between underfitting (high bias) and overfitting (high variance).

10.2.2 Kernel Smoothing and Local Averages

Many nonparametric regression estimators can be viewed as weighted local averages of the observed responses $\{Y_i\}$ . In the univariate case, if $x_i$ are observations of the predictor and $y_i$ are the corresponding responses, the nonparametric estimator at a point $x$ often takes the form:

$\hat{m}(x) = \sum_{i=1}^n w_i(x) \, y_i,$

where the weights $w_i(x)$ depend on the distance between $x_i$ and $x$ , and they satisfy:

$\sum_{i=1}^n w_i(x) = 1.$

We will see how this arises more concretely in kernel regression below.