An Introduction to Statistical Learning with the tidyverse and tidymodels
2022-09-19
Who, what, and why?
I am a data scientist and statistician who is (mostly) self-taught from textbooks and generous people sharing their work online. Inspired by projects like Solomon Kurz’s recoding of Statistical Rethinking and Emil Hvitfeldt’s ISLR tidymodels labs, I decided to publicly document my notes and code as I work through An Introduction to Statistical Learning, 2nd edition by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
I prefer to work with the tidyverse
collection of R packages, and so will be using those to wrangle and visualize the data.
Along the way, I’ll be teaching myself the tidymodels
framework for machine learning.
In general, my plan for each chapter/concept is to start with the original modeling package, then move towards the tidymodels
approach in the labs and exercises.
For example, I’ll first perform logistic regression with glm()
, then use parsnip::logistic_reg()
by the end of the chapter.
I think this will help me better appreciate the unified interface provided with tidymodels
, and maybe help me better understand what is going on under the hood.
I won’t be doing every exercise or section. My main goal for this project is to improve my statistical programming, so I will focus on the applied exercises rather than the conceptual.
The source code can be found here. Feel free to leave an issue if you find a mistake or have a suggestion. I can also be reached on Twitter.
As of 2022-06-13, I’ve completed Chapters 1 through 7.