# Notes for Predictive Modeling

*MSc in Big Data Analytics at Carlos III University of Madrid*

*2019-02-05, v5.1*

# Preface

### Welcome

Welcome to the notes for *Predictive Modeling* for the course 2018/2019. The subject is part of the MSc in Big Data Analytics from Carlos III University of Madrid.

The course is designed to have, roughly, **one lesson per each main topic** in the syllabus. The schedule is tight due to time constraints, which will inevitably make the treatment of certain methods a little superficial compared with what it would be the *optimal*. Nevertheless, the course will hopefully give you a respectable panoramic view of different available statistical methods for predictive modeling. A broad view of the syllabus and its planning is:

- Introduction (first lesson)
- Linear models I (first/second lesson)
- Linear models II (second/third lesson)
- Linear models III (third/fourth lesson)
- Generalized linear models (fifth/sixth lesson)
- Nonparametric regression (sixth/seventh lesson)

Some logistics for the development of the course follow:

- The
**office hours**are Thursdays from 19:15 to 20:15, at the classroom in which the session took place. Make use of them, especially instead of sending me lengthy emails with questions! **Questions and comments**during lectures are mostly welcome. So just go ahead and fire! Particularly if these are clarifications, comments or alternative perspectives that may help the rest of the class.- Detailed
**course evaluation**guidelines can be found here.

### Main references and credits

Several great reference books have been used for preparing these notes. The following list presents the books that have been consulted:

- Chacón and Duong (2018) (Section 6.1.4)
- DasGupta (2008) (Section 3.5.2)
- Durbán (2017) (Section 5.2.2)
- Fan and Gijbels (1996) (Sections 6.2, 6.2.3, and 6.2.4)
- Hastie, Tibshirani, and Friedman (2009) (Section 4.1)
- James et al. (2013) (Sections 2.2 – 2.7, 3.1, 3.5, and 3.6.3, 4.1)
- Kuhn and Johnson (2013) (Section 1.2)
- Li and Racine (2007) (Section 6.3)
- Loader (1999) (Section 6.5)
- McCullagh and Nelder (1983) (Sections 5.2 – 5.6)
- Peña (2002) (Sections 2.2 – 2.7, 3.5, and 5.2.1)
- Wand and Jones (1995) (Sections 6.1.2, 6.1.3, and 6.2.4)
- Wasserman (2004) (Sections 6.5)
- Wasserman (2006) (Sections 6.2.4)
- Wood (2006) (Sections 5.2.2 and 5.7)

These notes are possible due to the existence of the incredible pieces of software by Xie (2016a), Xie (2016b), Allaire et al. (2017), Xie and Allaire (2018), and R Core Team (2018). Also, certain hacks to improve the design layout have been possible due to the wonderful work of Úcar (2018). The icons used in the notes were designed by madebyoliver, freepik, and roundicons from Flaticon.

Last but not least, the notes have benefited from contributions from the following people.

**List of contributors**:

- Katherine Botz (performed a thorough proofreading of the course materials)
- José Ángel Fernández (provided fixes for several typos)
- Gulnur Demir (indicated a couple of typos)
- Manuel Rodríguez Ramírez (provided fixes for a couple of typos)
- Leonardo Stincone (provided fixes for several typos)

### Contributions

Contributions, reporting of typos, and feedback on the notes are very welcome. Either send an email to edgarcia@est-econ.uc3m.es or, if you are willing to provide several contributions, ask for access to the GitHub repository, so you can open a pull request and submit your modifications for approval. Give me a reason for writing your name in the list of contributors!

### License

All the material in these notes is licensed under the **Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License** (CC BY-NC-ND 4.0). You may not use this material except in compliance with the former license. The human-readable summary of the license states that:

**You are free to**:*Share*– Copy and redistribute the material in any medium or format.

**Under the following terms**:*Attribution*– You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.*NonCommercial*– You may not use the material for commercial purposes.*NoDerivatives*– If you remix, transform, or build upon the material, you may not distribute the modified material.

### References

Chacón, J. E., and T. Duong. 2018. *Multivariate Kernel Smoothing and Its Applications*. Vol. 160. Monographs on Statistics and Applied Probability. Boca Raton, FL: CRC Press. doi:10.1201/9780429485572.

DasGupta, A. 2008. *Asymptotic Theory of Statistics and Probability*. Springer Texts in Statistics. New York: Springer. doi:10.1007/978-0-387-75971-5.

Durbán, M. 2017. *Modelización Estadística*. Lecture notes.

Fan, J., and I. Gijbels. 1996. *Local Polynomial Modelling and Its Applications*. Vol. 66. Monographs on Statistics and Applied Probability. London: Chapman & Hall. doi:10.2307/2670134.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. *The Elements of Statistical Learning*. Second. Springer Series in Statistics. New York: Springer. doi:10.1007/978-0-387-84858-7.

James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. *An Introduction to Statistical Learning*. Vol. 103. Springer Texts in Statistics. New York: Springer. doi:10.1007/978-1-4614-7138-7.

Kuhn, M., and K. Johnson. 2013. *Applied Predictive Modeling*. New York: Springer. doi:10.1007/978-1-4614-6849-3.

Li, Qi, and Jeffrey Scott Racine. 2007. *Nonparametric Econometrics*. Princeton, NJ: Princeton University Press.

Loader, C. 1999. *Local Regression and Likelihood*. Statistics and Computing. New York: Springer-Verlag. doi:10.2307/1270956.

McCullagh, P., and J. A. Nelder. 1983. *Generalized Linear Models*. Monographs on Statistics and Applied Probability. London: Chapman & Hall. doi:10.1007/978-1-4899-3244-0.

Peña, D. 2002. *Regresión y Diseño de Experimentos*. Madrid: Alianza Editorial.

Wand, M. P., and M. C. Jones. 1995. *Kernel Smoothing*. Vol. 60. Monographs on Statistics and Applied Probability. London: Chapman & Hall, Ltd. doi:10.1007/978-1-4899-4493-1.

Wasserman, L. 2004. *All of Statistics*. Springer Texts in Statistics. New York: Springer-Verlag. doi:10.1007/978-0-387-21736-9.

Wasserman, L. 2006. *All of Nonparametric Statistics*. Springer Texts in Statistics. New York: Springer-Verlag. doi:10.1007/0-387-30623-4.

Wood, S. N. 2006. *Generalized Additive Models*. Texts in Statistical Science Series. Boca Raton: Chapman & Hall/CRC. doi:10.1201/9781420010404.

Xie, Y. 2016a. *Bookdown: Authoring Books and Technical Documents with R Markdown*. The R Series. Boca Raton: Chapman & Hall/CRC. https://bookdown.org/yihui/bookdown/.

Xie, Y. 2016b. *knitr: A General-Purpose Package for Dynamic Report Generation in R*. https://CRAN.R-project.org/package=knitr.

Allaire, J. J., J. Cheng, Y. Xie, J. McPherson, W. Chang, J. Allen, H. Wickham, A. Atkins, R. Hyndman, and R. Arslan. 2017. *rmarkdown: Dynamic Documents for R*. https://CRAN.R-project.org/package=rmarkdown.

Xie, Y., and J.J. Allaire. 2018. *tufte: Tufte’s Styles for R Markdown Documents*. https://CRAN.R-project.org/package=tufte.

R Core Team. 2018. *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Úcar, I. 2018. “Energy Efficiency in Wireless Communications for Mobile User Devices.” PhD thesis, Universidad Carlos III de Madrid. https://enchufa2.github.io/thesis/.