## 5.2 Bandwidth selection

Cross-validatory bandwidth selection, as studied in Section 4.3, extends neatly to the mixed multivariate case. For the fully continuous case, the least-squares cross validation selector is defined as

\begin{align*} \mathrm{CV}(\mathbf{h})&:=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{m}_{-i}(\mathbf{X}_i;q,\mathbf{h}))^2,\\ \hat{\mathbf{h}}_\mathrm{CV}&:=\arg\min_{h_1,\ldots,h_p>0}\mathrm{CV}(\mathbf{h}). \end{align*}

The cross-validation objective function becomes more challenging to minimize as $$p$$ grows. This is the reason why employing several starting values for optimizing it (as np does) is advisable.

The mixed case is defined in a completely analogous manner by just replacing continuous kernels $$K_h(\cdot)$$ with categorical $$l_u(\cdot,\cdot;\lambda)$$ or ordered discrete $$l_o(\cdot,\cdot;\eta)$$ kernels.

Importantly, the trick described in Proposition 4.1 holds with obvious modifications. It also holds for the mixed case and the Nadaraya–Watson estimator.

Proposition 5.1 For $$q=0,1$$, the weights of the leave-one-out estimator $$\hat{m}_{-i}(\mathbf{x};q,h)=\sum_{\substack{j=1\\j\neq i}}^nW_{-i,j}^q(\mathbf{x})Y_j$$ can be obtained from $$\hat{m}(\mathbf{x};q,h)=\sum_{i=1}^nW_{i}^q(\mathbf{x})Y_i$$:

\begin{align} W_{-i,j}^q(\mathbf{x})=\frac{W^q_j(\mathbf{x})}{\sum_{\substack{k=1\\k\neq i}}^nW_k^q(\mathbf{x})}=\frac{W^q_j(\mathbf{x})}{1-W_i^q(\mathbf{x})}.\tag{5.9} \end{align}

This implies that

\begin{align} \mathrm{CV}(\mathbf{h})=\frac{1}{n}\sum_{i=1}^n\left(\frac{Y_i-\hat{m}(\mathbf{X}_i;q,h)}{1-W_i^q(\mathbf{X}_i)}\right)^2.\tag{5.10} \end{align}

Remark. As in the univariate case, computing (5.10) requires evaluating the local polynomial estimator at the sample $$\{\mathbf{X}_i\}_{i=1}^n$$ and obtaining $$\{W_i^q(\mathbf{X}_i)\}_{i=1}^n$$ (which are needed to evaluate $$\hat{m}(\mathbf{X}_i;q,h)$$). Both tasks can be achieved simultaneously from the $$n\times n$$ matrix $$\big(W_{i}^q(\mathbf{X}_j)\big)_{ij}$$. Evaluating $$\hat{m}_{-i}(\mathbf{x};q,h)$$, because of (5.9), can be done with the weights $$\{W_i^q(\mathbf{x})\}_{i=1}^n$$.

Exercise 5.5 Implement an R function to compute (5.10) for the local constant estimator with multivariate (continuous) predictor. The function must receive as arguments the sample $$(\mathbf{X}_1,Y_1),\ldots,(\mathbf{X}_n,Y_n)$$ and the bandwidth vector $$\mathbf{h}$$. Use the normal kernel. Test your implementation by:

1. Simulating a random sample from a regression model with two predictors.
2. Computing its cross-validation bandwidths via np::npregbw.
3. Plotting a contour of the function $$(h_1,h_2)\mapsto \mathrm{CV}(h_1,h_2)$$ and checking that the minimum of this surface and the solution given by np::npregbw coincide.