2.7 Exercises

This is the list of evaluable exercises for Chapter 2. The number of stars represents an estimate of their difficulty: easy (\(\star\)), medium (\(\star\star\)), and hard (\(\star\star\star\)).

Exercise 2.4 (theoretical, \(\star\)) Prove that the histogram (2.1) is a proper density estimate (a nonnegative density that integrates one). Obtain its associated distribution function. What is its difference with respect to the ecdf (1.1)?
Exercise 2.5 (theoretical, \(\star\), adapted from Exercise 2.1 in Wand and Jones (1995)) Derive the result (2.7). Then obtain the exact MSE and MISE using (2.6) and (2.7).
Exercise 2.6 (theoretical, \(\star\star\)) Conditionally on the sample \(X_1,\ldots,X_n\), compute the expectation and variance of the kde (2.4) and compare them with the sample mean and variance. What is the effect of \(h\) in them?

Exercise 2.7 (theoretical, \(\star\star\), Exercise 3.3 in Wand and Jones (1995)) Show that

\[\begin{align*} \mathbb{E}[\mathrm{LSCV}(h)]=\mathrm{MISE}[\hat f(\cdot;h)]-R(f). \end{align*}\]

Exercise 2.8 (theoretical, \(\star\star\star\)) Show that:

  • \(\hat f(x;h)=f(x)+O(h^2)+O_\mathbb{P}\left((nh)^{-1/2}\right)=f(x)(1+o_\mathbb{P}(1))\).
  • \(\frac{1}{n}\sum_{i=1}^n(x-X_i)K_h(x-X_i)=\mu_2(K)f'(x)h^2+o(h^2)+O_\mathbb{P}\left(n^{-1/2}h^{\frac{1}{2}}\right)\).
Hint: use Chebyshev inequality.

Exercise 2.9 (theoretical, \(\star\star\star\), Exercise 2.23 in Wand and Jones (1995)) Show that the bias and variance for the transformation kde (2.28) are

\[\begin{align*} \mathrm{Bias}[\hat f_T(x;h,t)]&=\frac{1}{2}\mu_2(K)g''(t(x))t'(x)h^2+o(h^2),\\ \mathbb{V}\mathrm{ar}[\hat f_T(x;h,t)]&=\frac{R(K)}{nh}g(t(x))t'(x)^2+o((nh)^{-1}), \end{align*}\] where \(g\) is the density of \(t(X)\). Usig these results, prove that

\[\begin{align*} \mathrm{AMISE}[\hat f_T(\cdot;h,t)]=\frac{1}{4}\mu_2^2(K)\int t'(t^{-1}(x))g''(x)^2\,\mathrm{d}xh^4+\frac{R(K)}{nh}\mathbb{E}[t'(X)]. \end{align*}\]

Exercise 2.10 (practical, \(\star\)) The kde can be used to smoothly resample a dataset. To that end, first cumpute the kde of the dataset and then employ the algorithm of Section 2.6. Implement this resampling as a function that takes as arguments the dataset, the bandwidth \(h\), and the number of sampled points \(M\) wanted from the dataset. Use the normal kernel for simplicity. Test the implementation with the faithful dataset and different bandwidths.
Exercise 2.11 (practical, \(\star\star\), Exercise 6.5 in Wasserman (2006)) Data on the salaries of the chief executive officer of 60 companies is available at http://lib.stat.cmu.edu/DASL/Datafiles/ceodat.html (alternative link). Investigate the distribution of salaries using a kde. Use \(\hat h_\mathrm{LSCV}\) to choose the amount of smoothing. Also consider \(\hat h_\mathrm{RT}\). There appear to be a few bumps in the density. Are they real? Use confidence bands to address this question. Finally, comment on the resulting estimates.
Exercise 2.12 (practical, \(\star\star\)) Implement your own version of the transformation kde (2.28) for the three transformations given in Section 2.6. You can tweak the output of the density function in R and add an extra argument for selecting the kind of transformation. Or you can implement it directly from scratch.

Exercise 2.13 (practical, \(\star\star\star\)) A bandwidth selector is a random variable. Visualizing its density can help to understand its behavior, especially if it is compared with the asymptotic optimal bandwidth \(h_\mathrm{AMISE}\). Create a script that does the following steps:

  1. For \(j=1,\ldots,M=10000\):

    • Simulates a sample from a model mixture of nor1mix.
    • Computes the bandwidth selectors \(\hat h_{\mathrm{RT}}\), \(\hat h_{\mathrm{BCV}}\), \(\hat h_{\mathrm{UCV}}\), and \(\hat h_{\mathrm{DPI}}\), and stores them.
  2. Estimates the density of each bandwidth selector from its corresponding sample of size \(M\). Use the RT selector for estimating the density.

  3. Plots the estimated densities together.

  4. Draws a vertical line for representing the \(h_\mathrm{AMISE}\) bandwidth

Describe the results for the “Claw” and “Bimodal” densities in nor1mix, for sample sizes \(n=100,500\).
Exercise 2.14 (practical, \(\star\star\star\)) Use (2.22) and the family of densities of Marron and Wand (1992) in nor1mix to compare the MISE and AMISE criteria. To that purpose, code (2.22) and the AMISE expression for the normal kernel and compare the two error curves and the two minimizers. Explore three models of your choice from nor1mix for sample sizes \(n=50, 100, 200\). Describe in detail the results and the major takeaways.

Exercise 2.15 (practical, \(\star\star\star\)) The kde can be extended to the multivariate setting by using product kernels. For a sample \(\mathbf{X}_1,\ldots,\mathbf{X}_n\) in \(\mathbb{R}^p\), the multivariate kde employing product kernels is

\[\begin{align*} \hat f(\mathbf{x};\mathbf{h})=\frac{1}{n}\sum_{i=1}^nK_{h_1}(x_1-X_{i,1})\times\stackrel{p}{\cdots}\times K_{h_p}(x_p-X_{i,p}), \end{align*}\] where \(\mathbf{x}=(x_{1},\ldots,x_{p})\), \(\mathbf{X}_i=(X_{i,1},\ldots,X_{i,p})\), and \(\mathbf{h}=(h_1,\ldots,h_p)\) is a vector of bandwidths.


  • Implement a function that computes the bivariate kde using normal kernels.
  • Create a sample by simulating \(500\) points from a \(\mathcal{N}((-1, -1),\allowbreak \mathrm{diag}(1, 2))\) and \(500\) from a \(\mathcal{N}((1, 1), \mathrm{diag}(2, 1))\).
  • Estimate the unknown density from sample of size \(n=1000\).
  • Check graphically the correct implementation by comparing the kde with the true density (use ?contour).


Marron, J. S., and M. P. Wand. 1992. “Exact Mean Integrated Squared Error.” Ann. Statist. 20 (2): 712–36.

Wand, M. P., and M. C. Jones. 1995. Kernel Smoothing. Vol. 60. Monographs on Statistics and Applied Probability. London: Chapman & Hall, Ltd. https://doi.org/10.1007/978-1-4899-4493-1.

Wasserman, L. 2006. All of Nonparametric Statistics. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/0-387-30623-4.