5.2 Bandwidth selection

Cross-validatory bandwidth selection, as studied in Section 4.3, extends neatly to the mixed multivariate case. For the fully continuous case, the least-squares cross validation selector is defined as

\[\begin{align*} \mathrm{CV}(\mathbf{h})&:=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{m}_{-i}(\mathbf{X}_i;q,\mathbf{h}))^2,\\ \hat{\mathbf{h}}_\mathrm{CV}&:=\arg\min_{h_1,\ldots,h_p>0}\mathrm{CV}(\mathbf{h}). \end{align*}\]

The cross-validation objective function becomes more challenging to minimize as \(p\) grows. This is the reason why employing several starting values for optimizing it (as np does) is advisable.

The mixed case is defined in a completely analogous manner by just replacing continuous kernels \(K_h(\cdot)\) with categorical \(l_u(\cdot,\cdot;\lambda)\) or ordered discrete \(l_o(\cdot,\cdot;\eta)\) kernels.

Importantly, the trick described in Proposition 4.1 holds with obvious modifications. It also holds for the mixed case and the Nadaraya–Watson estimator.

Proposition 5.1 For \(q=0,1\), the weights of the leave-one-out estimator \(\hat{m}_{-i}(\mathbf{x};q,h)=\sum_{\substack{j=1\\j\neq i}}^nW_{-i,j}^q(\mathbf{x})Y_j\) can be obtained from \(\hat{m}(\mathbf{x};q,h)=\sum_{i=1}^nW_{i}^q(\mathbf{x})Y_i\):

\[\begin{align} W_{-i,j}^q(\mathbf{x})=\frac{W^q_j(\mathbf{x})}{\sum_{\substack{k=1\\k\neq i}}^nW_k^q(\mathbf{x})}=\frac{W^q_j(\mathbf{x})}{1-W_i^q(\mathbf{x})}.\tag{5.9} \end{align}\]

This implies that

\[\begin{align} \mathrm{CV}(\mathbf{h})=\frac{1}{n}\sum_{i=1}^n\left(\frac{Y_i-\hat{m}(\mathbf{X}_i;q,h)}{1-W_i^q(\mathbf{X}_i)}\right)^2.\tag{5.10} \end{align}\]

Remark. As in the univariate case, computing (5.10) requires evaluating the local polynomial estimator at the sample \(\{\mathbf{X}_i\}_{i=1}^n\) and obtaining \(\{W_i^q(\mathbf{X}_i)\}_{i=1}^n\) (which are needed to evaluate \(\hat{m}(\mathbf{X}_i;q,h)\)). Both tasks can be achieved simultaneously from the \(n\times n\) matrix \(\big(W_{i}^q(\mathbf{X}_j)\big)_{ij}\). Evaluating \(\hat{m}_{-i}(\mathbf{x};q,h)\), because of (5.9), can be done with the weights \(\{W_i^q(\mathbf{x})\}_{i=1}^n\).

Exercise 5.5 Implement an R function to compute (5.10) for the local constant estimator with multivariate (continuous) predictor. The function must receive as arguments the sample \((\mathbf{X}_1,Y_1),\ldots,(\mathbf{X}_n,Y_n)\) and the bandwidth vector \(\mathbf{h}\). Use the normal kernel. Test your implementation by:

  1. Simulating a random sample from a regression model with two predictors.
  2. Computing its cross-validation bandwidths via np::npregbw.
  3. Plotting a contour of the function \((h_1,h_2)\mapsto \mathrm{CV}(h_1,h_2)\) and checking that the minimum of this surface and the solution given by np::npregbw coincide.