5.2 Bandwidth selection
Cross-validatory bandwidth selection, as studied in Section 4.3, extends neatly to the mixed multivariate case. For the fully continuous case, the least-squares cross validation selector is defined as
CV(h):=1nn∑i=1(Yi−ˆm−i(Xi;q,h))2,ˆhCV:=argmin
The cross-validation objective function becomes more challenging to minimize as p grows. This is the reason why employing several starting values for optimizing it (as np
does) is advisable.
The mixed case is defined in a completely analogous manner by just replacing continuous kernels K_h(\cdot) with categorical l_u(\cdot,\cdot;\lambda) or ordered discrete l_o(\cdot,\cdot;\eta) kernels.
Importantly, the trick described in Proposition 4.1 holds with obvious modifications. It also holds for the mixed case and the Nadaraya–Watson estimator.
Proposition 5.1 For q=0,1, the weights of the leave-one-out estimator \hat{m}_{-i}(\mathbf{x};q,h)=\sum_{\substack{j=1\\j\neq i}}^nW_{-i,j}^q(\mathbf{x})Y_j can be obtained from \hat{m}(\mathbf{x};q,h)=\sum_{i=1}^nW_{i}^q(\mathbf{x})Y_i:
\begin{align} W_{-i,j}^q(\mathbf{x})=\frac{W^q_j(\mathbf{x})}{\sum_{\substack{k=1\\k\neq i}}^nW_k^q(\mathbf{x})}=\frac{W^q_j(\mathbf{x})}{1-W_i^q(\mathbf{x})}.\tag{5.13} \end{align}
This implies that
\begin{align} \mathrm{CV}(\mathbf{h})=\frac{1}{n}\sum_{i=1}^n\left(\frac{Y_i-\hat{m}(\mathbf{X}_i;q,h)}{1-W_i^q(\mathbf{X}_i)}\right)^2.\tag{5.14} \end{align}
Remark. As in the univariate case, computing (5.14) requires evaluating the local polynomial estimator at the sample \{\mathbf{X}_i\}_{i=1}^n and obtaining \{W_i^q(\mathbf{X}_i)\}_{i=1}^n (which are needed to evaluate \hat{m}(\mathbf{X}_i;q,h)). Both tasks can be achieved simultaneously from the n\times n matrix \big(W_{i}^q(\mathbf{X}_j)\big)_{ij}. Evaluating \hat{m}_{-i}(\mathbf{x};q,h), because of (5.13), can be done with the weights \{W_i^q(\mathbf{x})\}_{i=1}^n.
Exercise 5.6 Implement an R function to compute (5.14) for the local constant estimator with multivariate (continuous) predictor. The function must receive as arguments the sample (\mathbf{X}_1,Y_1),\ldots,(\mathbf{X}_n,Y_n) and the bandwidth vector \mathbf{h}. Use the normal kernel. Test your implementation by:
- Simulating a random sample from a regression model with two predictors.
- Computing its cross-validation bandwidths via
np::npregbw
. - Plotting a contour of the function (h_1,h_2)\mapsto \mathrm{CV}(h_1,h_2) and checking that the minimizers and minimum of this surface coincide with the solution given by
np::npregbw
.
Consider several regression models for testing the implementation.
Exercise 5.7 Perform a simulation study similar to that of Exercise 4.19 to illustrate the erratic behavior of local constant and linear estimators “holes” in the support of two predictors (X_1,X_2).
- Design a distribution pattern for (X_1,X_2) that features an internal “hole”. An example of such distribution is the “oval” density simulated in Section 3.5.4.
- Define a regression function m that is neither constant nor linear in both predictors, and that behaves differently at different sides of the hole, or at the hole.
- Simulate n observations from a regression model Y=m(X_1,X_2)+\varepsilon for a sample size of your choice.
- Compute the CV bandwidths and the associated local constant and linear fits.
- Plot the fits as surfaces. Trick: adjust the transparency of each the surface for better visualization.
- Repeat Steps 2–4 M=50 times.
Comment on the obtained results.