Preface

Welcome

This course is intended to provide an introduction to nonparametric estimation of the density and regression functions from, mostly, the perspective of kernel smoothing. The emphasis is placed in building intuition behind the methods, gaining insights into their asymptotic properties, and showing their application through the use of statistical software.

The animations of these notes will not be displayed the first time they are browsed. To see them, click on the caption’s link “Application available here and allow an exception in your browser. After this is done once, all the animations will show up correctly within the notes in the future.

Course objectives and logistics

The software employed in the course is the statistical language R and its most common IDE (Integrated Development Environment) nowadays, RStudio. A basic prior knowledge of both is assumed1. The appendix presents basic introductions to RStudio and R for those students lacking basic expertise on them.

The notes contain a substantial amount of snippets of code that are fully self-contained. Students are encouraged to bring their own laptops at the lessons to practice with the code.

The required packages for the course are:

# Install packages
install.packages(c("ks", "nor1mix", "KernSmooth", "manipulate", "locfit"))

The codes in the notes may assume that the packages have been loaded, so it is better to do it now:

# Load packages
library(ks)
library(nor1mix)
library(KernSmooth)
library(locfit)

The Shiny interactive apps on the notes can be downloaded and run locally. This allows in particular to examine their codes. Check this GitHub repository for the sources.

Each topic of this contains a mix of theoretical and practical exercises for grading. Groups of two students must choose three exercises in total (at least one theoretical and other practical) from Sections 2.7 and 3.6 and turn them in order to be graded. The group grade is weighted according to the difficulty of the exercises, which is given by the number of stars: easy (\(\star\)), medium (\(\star\star\)), and hard (\(\star\star\star\)). The final grade (\(0-5\)) is

\[\begin{align*} \frac{1}{3}\sum_{i=1}^3\frac{\mathrm{Score}_i}{5}(2+\star_i), \end{align*}\]

where \(\mathrm{Score}_i\) is the score (\(0-5\)) for the \(i\)-th exercise and \(\star_i\) represents its number of stars (\(1-3\)).

Main references and credits

Several great reference books have been used for preparing these notes. The following list details the sections in which each of them has been consulted:

In addition, these notes are possible due to the existence of these incredible pieces of software: Xie (2016), Xie (2015), Allaire et al. (2017), and R Core Team (2017).

The icons used in the notes were designed by madebyoliver, freepik, and roundicons from Flaticon.

License

All the material in these notes is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License (CC BY-NC-ND 4.0). You may not use this material except in compliance with the former license. The human-readable summary of the license states that:

  • You are free to:
    • Share – Copy and redistribute the material in any medium or format.
  • Under the following terms:
    • Attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
    • NonCommercial – You may not use the material for commercial purposes.
    • NoDerivatives – If you remix, transform, or build upon the material, you may not distribute the modified material.

References

Fan, J., and I. Gijbels. 1996. Local Polynomial Modelling and Its Applications. Vol. 66. Monographs on Statistics and Applied Probability. London: Chapman & Hall.

DasGupta, A. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer. doi:10.1007/978-0-387-75971-5.

Loader, C. 1999. Local Regression and Likelihood. Statistics and Computing. New York: Springer-Verlag.

Scott, D. W. 2015. Multivariate Density Estimation. Second. Wiley Series in Probability and Statistics. Hoboken: John Wiley & Sons, Inc.

Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability. London: Chapman & Hall.

van der Vaart, A. W. 1998. Asymptotic Statistics. Vol. 3. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802256.

Wand, M. P., and M. C. Jones. 1995. Kernel Smoothing. Vol. 60. Monographs on Statistics and Applied Probability. London: Chapman & Hall, Ltd. doi:10.1007/978-1-4899-4493-1.

Wasserman, L. 2004. All of Statistics. Springer Texts in Statistics. New York: Springer-Verlag. doi:10.1007/978-0-387-21736-9.

Wasserman, L. 2006. All of Nonparametric Statistics. Springer Texts in Statistics. New York: Springer. doi:10.1007/0-387-30623-4.

Xie, Y. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. The R Series. Boca Raton: Chapman & Hall/CRC. https://bookdown.org/yihui/bookdown/.

Xie, Y. 2015. Dynamic Documents with R and Knitr. The R Series. Boca Raton: Chapman & Hall/CRC.

Allaire, J. J., J. Cheng, Y. Xie, J. McPherson, W. Chang, J. Allen, H. Wickham, A. Atkins, R. Hyndman, and R. Arslan. 2017. Rmarkdown: Dynamic Documents for R. https://CRAN.R-project.org/package=rmarkdown.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.


  1. Among others: basic programming in R, ability to work with objects and data structures, ability to produce graphics, knowledge of the main statistical functions, ability to run scripts in RStudio.