3.3 Asymptotic properties

The purpose of this section is to provide some key asymptotic results for the bias, variance, and asymptotic normality of the local linear and local constant estimators. These provide useful insights on the effect of $p,$ $m,$ $f,$ and $\sigma^2$ in the performance of the estimators. Proofs and detailed analysis are skipped; we refer the interested reader to Ruppert and Wand (1994), Section 5.3 of Wand and Jones (1995), and Section 3.2 of Fan and Gijbels (1996).

Along this section we will make the following assumptions:

A1. $m$ is twice continuously differentiable.
A2. $\sigma^2$ is continuous and positive.
A3. $f$ is continuously differentiable and positive.
A4. The kernel $K$ is a symmetric and bounded pdf with finite second moment and is square integrable.
A5. $h=h_n$ is a deterministic sequence of bandwidths such that, when $n\to\infty,$ $h\to0$ and $nh\to\infty.$

The bias and variance are expanded in their conditional versions on the predictor’s sample $X_1,\ldots,X_n.$ The reason of analyzing the conditional instead of the unconditional versions is avoiding technical difficulties that integration with respect to the predictor’s density may pose.

Theorem 3.2 Under A1–A5, the conditional bias and variance of the local constant ( $p=0$ ) and local linear ( $p=1$ ) estimators are

$\begin{align} \mathrm{Bias}[\hat m(x;p,h)\vert X_1,\ldots,X_n]&=B_p(x)h^2+o_\mathbb{P}(h^2),\tag{3.20}\\ \mathbb{V}\mathrm{ar}[\hat m(x;p,h)\vert X_1,\ldots,X_n]&=\frac{R(K)}{nhf(x)}\sigma^2(x)+o_\mathbb{P}((nh)^{-1}),\tag{3.21} \end{align}$

where

$\begin{align*} B_p(x)=\frac{\mu_2(K)}{2}\left\{m''(x)+2\frac{m'(x)f'(x)}{f(x)}1_{\{p=0\}}\right\}. \end{align*}$

The bias and variance expressions (3.20) and (3.21) yield interesting insights:

The bias decreases with $h$ quadratically for $p=0,1.$ The bias at $x$ is directly proportional to $m''(x)$ if $p=1$ and affected by $m''(x)$ if $p=0.$ This has the same interpretation as in the density setting:
- The bias is negative in concave regions, i.e. $\{x\in\mathbb{R}:m(x)''<0\}.$ These regions correspond to peaks and modes of $m$
- Conversely, the bias is positive in convex regions, i.e. $\{x\in\mathbb{R}:m(x)''>0\}.$ These regions correspond to valleys of $m$ .
- The wilder the curvature $m'',$ the harder to estimate $m$ .
The bias for $p=0$ at $x$ is affected by $m'(x),$ $f'(x),$ and $f(x).$ Precisely, the lower the density $f(x),$ the larger the bias. And the faster $m$ and $f$ change at $x,$ the larger the bias. Thus the bias of the local constant estimator is much more sensible to $m(x)$ and $f(x)$ than the local linear (which is only sensible to $m''(x)$ ). Particularly, the fact that it depends on $f'(x)$ and $f(x)$ is referred as the design bias since it depends merely on the predictor’s distribution.
The variance depends directly on $\frac{\sigma^2(x)}{f(x)}$ for $p=0,1.$ As a consequence, the lower the density and larger the conditional variance, the more variable is $\hat m(\cdot;p,h).$ The variance decreases at a factor of $(nh)^{-1}$ due to the effective sample size.

An extended version of Theorem 3.2, given in Theorem 3.1 of Fan and Gijbels (1996), shows that odd order polynomial fits are preferable to even order polynomial fits. The reason is that odd orders introduce an extra coefficient for the polynomial fit that allows to reduce the bias, while at the same time they keep the variance unchanged. In summary, the conclusions of the above analysis of $p=0$ vs. $p=1,$ namely that $p=1$ has smaller bias than $p=0$ (but of the same order) and the same variance as $p=0,$ extend to the case $p=2\nu$ vs. $p=2\nu+1,$ $\nu\in\mathbb{N}.$ This allows to claim that local polynomial fitting is an odd world (Fan and Gijbels (1996)).

Finally, we have the asymptotic pointwise normality of the estimator.

Theorem 3.3 Assume that $\mathbb{E}[(Y-m(x))^{2+\delta}\vert X=x]<\infty$ for some $\delta>0.$ Then, under A1–A5,

$\begin{align} &\sqrt{nh}(\hat m(x;p,h)-\mathbb{E}[\hat m(x;p,h)])\stackrel{d}{\longrightarrow}\mathcal{N}\left(0,\frac{R(K)\sigma^2(x)}{f(x)}\right),\tag{3.22}\\ &\sqrt{nh}\left(\hat m(x;p,h)-m(x)-B_p(x)h^2\right)\stackrel{d}{\longrightarrow}\mathcal{N}\left(0,\frac{R(K)\sigma^2(x)}{f(x)}\right).\tag{3.23} \end{align}$

References

Fan, J., and I. Gijbels. 1996. Local Polynomial Modelling and Its Applications. Vol. 66. Monographs on Statistics and Applied Probability. London: Chapman & Hall.

Ruppert, D., and M. P. Wand. 1994. “Multivariate Locally Weighted Least Squares Regression.” Ann. Statist. 22 (3): 1346–70. https://doi.org/10.1214/aos/1176325632.

Wand, M. P., and M. C. Jones. 1995. Kernel Smoothing. Vol. 60. Monographs on Statistics and Applied Probability. London: Chapman & Hall, Ltd. https://doi.org/10.1007/978-1-4899-4493-1.