10.2 Basic Concepts in Nonparametric Estimation
10.2.1 Bias-Variance Trade-Off
For a given method of estimating m(x), we denote the estimator as ˆm(x). The mean squared error (MSE) at a point x is defined as:
MSE(x)=E[{ˆm(x)−m(x)}2].
This MSE can be decomposed into two key components: bias and variance:
MSE(x)=[E[ˆm(x)]−m(x)]2+Var(ˆm(x)).
Where:
Bias: Measures the systematic error in the estimator: Bias2=[E[ˆm(x)]−m(x)]2.
Variance: Measures the variability of the estimator around its expected value: Var(ˆm(x))=E[{ˆm(x)−E[ˆm(x)]}2].
Nonparametric methods often have low bias because they can adapt to a wide range of functions. However, this flexibility can lead to high variance, especially when the model captures noise rather than the underlying signal.
The bandwidth or smoothing parameter in nonparametric methods typically controls this trade-off:
- Large bandwidth ⇒ smoother function ⇒ higher bias, lower variance.
- Small bandwidth ⇒ more wiggly function ⇒ lower bias, higher variance.
Selecting an optimal bandwidth is critical, as it determines the balance between underfitting (high bias) and overfitting (high variance).
10.2.2 Kernel Smoothing and Local Averages
Many nonparametric regression estimators can be viewed as weighted local averages of the observed responses {Yi}. In the univariate case, if xi are observations of the predictor and yi are the corresponding responses, the nonparametric estimator at a point x often takes the form:
ˆm(x)=n∑i=1wi(x)yi,
where the weights wi(x) depend on the distance between xi and x, and they satisfy:
n∑i=1wi(x)=1.
We will see how this arises more concretely in kernel regression below.