An Introduction to Statistical Learning with the tidyverse and tidymodels
Who, what, and why?
I am a data scientist and statistician who is (mostly) self-taught from textbooks and generous people sharing their work online. Inspired by projects like Solomon Kurz’s recoding of Statistical Rethinking, I decided to publicly document my notes and code as I work through An Introduction to Statistical Learning, 2nd edition by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
I prefer to work with the
tidyverse collection of R packages, and so will be using those to wrangle and visualize the data.
Along the way, I’ll be teaching myself the
tidymodels framework for machine learning.
In general, my plan for each chapter/concept is to start with the original modeling package, then move towards the
tidymodels approach in the labs and exercises.
For example, I’ll first perform logistic regression with
glm(), then use
parsnip::logistic_reg() by the end of the chapter.
I think this will help me better appreciate the unified interface provided with
tidymodels, and maybe help me better understand what is going on under the hood.
I won’t be doing every exercise or section. My main goal for this project is to improve my statistical programming, so I will focus on the applied exercises rather than the conceptual.
As of 2022-06-13, I’ve completed Chapters 1 through 7.