3.6 Exercises

This is the list of evaluable exercises for Chapter 3. The number of stars represents an estimate of their difficulty: easy (\(\star\)), medium (\(\star\star\)), and hard (\(\star\star\star\)).

Exercise 3.6 (theoretical, \(\star\)) Show that the local polynomial estimator yields the Nadaraya–Watson when \(p=0\). Use (3.18) to obtain (3.12).
Exercise 3.7 (theoretical, \(\star\star\)) Obtain the optimization problem for the local Poisson regression (for the first degree) and the local binomial regression (of first degree also).
Exercise 3.8 (theoretical, \(\star\star\)) Show that the Nadaraya–Watson is unbiased (in conditional expectation with respect to \(X_1,\ldots,X_n\)) when the regression function is constant: \(m(x)=c\), \(c\in\mathbb{R}\). Show the same for the local linear estimator for a linear regression function \(m(x)=a+bx\), \(a,b\in\mathbb{R}\). Hint: use (3.19).
Exercise 3.9 (theoretical, \(\star\star\star\)) Obtain the weight expressions (3.19) of the local linear estimator. Hint: use the matrix inversion formula for \(2\times2\) matrices.
Exercise 3.10 (theoretical, \(\star\star\star\)) Prove the two implications of Proposition 3.1 for the Nadaraya–Watson estimator (\(p=0\)).
Exercise 3.11 (practical, \(\star\star\), Example 4.6 in Wasserman (2006)) The dataset at http://www.stat.cmu.edu/~larry/all-of-nonpar/=data/bpd.dat (alternative link) contains information about the presence of bronchopulmonary dysplasia (binary response) and the birth weight in grams (predictor) of 223 babies. Use the function locfit of the locfit library with the argument family = "binomial" and plot its output. Explore and comment on the resulting estimates, providing insights about the data.
Exercise 3.12 (practical, \(\star\star\)) The ChickWeight dataset in R contains 578 observations of weight and Times of chicks. Fit a local binomial or local Poisson regression of weight on Times. Use the function locfit of the locfit library with the argument family = "binomial" or family = "poisson" and explore the bandwidth effect. Explore and comment on the resulting estimates. What is the estimated expected time of a chick that weights 200 grams?
Exercise 3.13 (practical, \(\star\star\star\)) Implement your own version of the local linear estimator. The function must take a sample X, a sample Y, the points x at which the estimate should be obtained, the bandwidth h and the kernel K. Test its correct behavior by estimating an artificial dataset that follows a linear model.

Exercise 3.14 (practical, \(\star\star\star\)) Implement your own version of the local likelihood estimator of first degree for exponential response. The function must take a sample X, a sample Y, the points x at which the estimate should be obtained, the bandwidth h and the kernel K. Test its correct behavior by estimating an artificial dataset that follows a generalized linear model with exponential response, this is,

\[\begin{align*} Y|X=x \sim \mathrm{Exp}(\lambda(x)),\quad \lambda(x)=e^{\beta_0+\beta_1x}, \end{align*}\]

using a cross-validated bandwidth. Hint: use optim or nlm for optimizing a function in R.

References

Wasserman, L. 2006. All of Nonparametric Statistics. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/0-387-30623-4.