dir.create("R")
download.file("http://xcelab.net/rmpubs/sr2/code.txt", "R/code.R")
Personal Notes: Statistical Rethinking (2nd ed)
Preface
Content and Goals of this Book
Text Passages
This book collects personal notes during reading of Statistical Rethinking by Richard McElreath. I am using the second edition published 2020 by CRC Press an imprint of Routledge of the Taylor & Francis Group. Additionally I am using Statistical Rethinking 2023, the most recent set of free YouTube video lectures.
You can find links to other material on McElreath’s website about the book. Of special interest for me are the brms+tidyverse and the Stan+tidyverse conversion of his code. As I am not very experienced with R and completely new to Bayesian statistics and their tools this additional material is for me also very challenging. I am planning to read them simultaneously (section by section) and will dedicate parallel sections for their approaches. This has the advantage that the section numbers of the files conform to the section numbers of the second edition of the printed book.
Sections with the
- header “Original” refers to the original book
- header “Tidyverse” refers to the {tidyverse} / {brms} conversion
- header “Stan” refers to the {rstan} conversion
- header “Reconsideration” refers to sections with my personal comments.
My text and code consists mostly of quotes from the - second book edition 2020 or - from the text of Robert McElreath’s video lectures 2023 or - from Solomon Kurz’s tidyverse / brms version or - from Vincent Arel-Bundock converted Stan code version.
Often I made minor editing (e.g., shorting the text) or put the content in my own wording. But almost all of my text of this Quarto book are not mine, but is coming from the resources mentioned above. Therefore I had many times not indicated these quotes. If you follow the book or the other resources you will note the similarities and know to which paragraph or section I am referring. And you will also realize whenever the text passage reflects my own thoughts. In any case I am the only responsible person for this text, especially if I have used code from the resources wrongly or misunderstood a quoted text passage.
I wrote this book as a text for others to read because that forces me to be become explicit and explain all my learning outcomes more carefully. Please keep in mind that this text is not written by an expert but by a learner. In spite of replicating most of the content it may contain many mistakes. All these misapprehensions and errors are my responsibility.
Code Chunks
Packages {rethinking} and {brms} have similar tasks. Therefore they share a lot of identical function name. Kurz has unloaded the {rethinking} package when it came to explain {brms} function and to prevent name conflicts. But this approach is not efficient for the structure of my documents where I have constantly changed between these two packages. So I just loaded with base::library()
only the {tidyverse} meta packages with it attached nine packages: {dplyr}, {forcats}, {ggplot2}, {lubridate}, {purrr}, {readr}, {stringr}, {tibble}, and {tidyr}.
Whenever I used another packages I called the function with the package name in front with the syntax <package name>::<function name>()
.
To prevent conflicts in chunk names, objects and variables I added the following suffix to the end of the name:
- suffix
a
for the original book version in the main text - suffix
b
for the {tidyverse} / {brms} version in the main text - suffix
c
for the {rstan} version in the main text (not used) - suffix
r
for the {rethinking} version in the synopsis - suffix
s
for the Stan {rstan} version in the synopsis - suffix
t
for the {tidyverse} / {brms} version in the synopsis
I am not using the exact code snippets for my replications because I am not only replicating the code to see how it works but also to change the values of parameters to observe their influences. Especially when it comes to plotting I try to use ggplot2 instead the base plotting system I have no experience at all.
As I have already some experiences with the {tidyverse} approach I do not include all code snippets from Kurz’s version. I am concentrating to learn Bayesian statistics and if there are no conceptual news for me I am not going to include the corresponding passages.
This is my first book using Quarto instead of bookdown I am using these notes therefore also to learn Quarto. As a result you will find sometimes remarks or call-out blocks to my Quarto experiences.
Synopsis
At the end of each chapter I summarize the rationale for the used techniques. It is more than a summary because it goes into details of code chunks. But I will leave out all supporting or illustrating passages and focus on a holistic point of view. The code chunks are somewhat cleaned so that the most important code snippets to understand the technique are in the center.
This synopsis gives me another occasion to digest the main points and try out the most important code chunks of the text. I will use the same names for code chunks, objects and variables as in the main text but with different suffixes (r
, s
, t
instead of a
, b
, c
).
Get Code Examples
Go to the book website and download the R code examples for the book.
There are big differences between the code snippets of the 2nd edition collected in code.txt and the new version preparing the 3rd book version. These new code snippets can be found in the slides and/or in the videos. I will always refer to the place where they can be found.
Additionally you will find all the scripts supporting the animation in the lectures at the new 2023 github repo.
The style of the code snippets is not the tidyverse style. For instance: The equal sign =
is not embedded between spaces but a list of variables, separated by comas has in front and before the coma a space.
I have converted the original code style with the RStudio addin {styler} package to tidyverse style: Assuming that the default value of the style transformer is styler::tidyverse_style()
I selected the code snippet I wanted to convert and called the addin which ran styler:::style_selection()
. As an example: The transformation of the above code snippet resulted into the code below:
sample <- c("W", "L", "W", "W", "W", "L", "W", "L", "W")
W <- sum(sample == "W") # number of W observed
L <- sum(sample == "L") # number of L observed
p <- c(0, 0.25, 0.5, 0.75, 1) # proportions W
ways <- sapply(p, function(q) (q * 4)^W * ((1 - q) * 4)^L)
prob <- ways / sum(ways)
cbind(p, ways, prob)
As copy & paste from the slides does not work I downloaded the PDF of the Speaker deck slides. But still, it didn’t work always. In that case I used TextSniper and formatted manually. But these copy & paste problems only arise when using new code, prepared for the 3rd edition. With the book (2nd ed.) I do not have problems to copy the code snippets via calibre with the ePUB eBook version.
Setup Chunks
At first I tried to collect all necessary package to load with library()
in the setup
chunk (See Quarto equivalent to RMarkdown setup chunk). The idea was to be able to run individual calling functions from different packages for test purposes and not to have to run all code chunks of the very long files. Additionally I tried to prevent conflicts of function names with the conflicts_prefer()
function of the {conflicted} package in each setup file.
But it turned out that this was not a feasible approach: I noticed between packages with similar purpose (in my case between {rethinking} and {**brms*}) too many conflicts. Furthermore many of these conflicts are hidden because the came from imports from other packages.
I will therefore just load the meta package {tidverse} in the setup chunks of every files. To prevent conflict with function names I will call functions of other packages with the syntax <package name>::<function()>. This has the additional advantage to learn from which package the function comes from. And my aim to be able to run code chunks separately is also possible.
I will differentiate between “base R” referring to all packages loaded automatically after starting R and the package {base} itself, that is part of the collection of “base R” packages.
If you find errors please do not hesitate to write issues or PRs on my GitHub site. I really appreciate it to learn from more experienced R users! It shortens the learning paths of self-directed learners.
Package Installation
In contrast to the sparse and partly outdated remarks in the book use the installation section from the rethinking
package at GitHub.
Step 1
From the three steps I had already successfully installed the first one (rstan
and the C++
toolchain), so I had no need to follow the detailed instructions of the rstan
installation at https://mc-stan.org/users/interfaces/rstan.html.
Step 2
To install the cmdstanr
package I visited https://mc-stan.org/cmdstanr/. This is an addition to my previous installation with the older version (2nd ed., 2022). As I installed the latest beta version of cmdstanr
the first time I also needed to compile the libraries with cmdstanr::install_cmdstan()
.
To check the result of my installation I ran check_cmdstan_toolchain()
.
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
cmdstanr::install_cmdstan()
cmdstanr::check_cmdstan_toolchain()
The command for downloaded cmdstanr
did not install the vignettes, which take a long time to build, but they are always available online at https://mc-stan.org/cmdstanr/articles/.
The vignette Getting started with CmdStanR also recommend to load the bayesplot
and posterior
packages, which are used later in the CmdStanR
-examples. But I believe these two packages are not necessary if you just plan to stick with the book.
Step 3
Once the infrastructure is installed one can install the packages used by the book. With the exception of rethinking — the companion package of the book – they can all be downloaded from CRAN.
I had already devtools installed, therefore I deleted it from the list of installed packages.
install.packages(c("coda","mvtnorm", "loo","dagitty","shape"))
devtools::install_github("rmcelreath/rethinking")
Course Schedule
The following tables matches the lectures (videos 2023 and slides 2023) with the book chapters of the second edition (2020). It was generated by a screenshot from Statistical Rethinking 2023 - 01 - The Golem of Prague (50:09), but can also be found as a slide in Statistical Rethinking 2023 - Lecture 01.
A better overview with links to videos and slides provides the following HTML table, taken from the README.md file for the 2023 lectures.
Week ## | Meeting date | Reading | Lectures |
---|---|---|---|
Week 01 | 06 January | Chapters 1, 2 and 3 | [1] <Golem of Prague> <Slides> [2] <Garden of Forking Data> <Slides> |
Week 02 | 13 January | Chapter 4 | [3] <Geocentric Models> <Slides> [4] <Categories and Curves> <Slides> |
Week 03 | 20 January | Chapters 5 and 6 | [5] <Elemental Confounds> <Slides> [6] <Good and Bad Controls> <Slides> |
Week 04 | 27 January | Chapters 7,8,9 | [7] <Overfitting> <Slides> [8] <MCMC> <Slides> |
Week 05 | 03 February | Chapters 10 and 11 | [9] <Modeling Events> <Slides> [10] <Counts and Confounds> <Slides> |
Week 06 | 10 February | Chapters 11 and 12 | [11] <Ordered Categories> <Slides> [12] <Multilevel Models> <Slides> |
Week 07 | 17 February | Chapter 13 | [13] <Multilevel Adventures> <Slides> [14] <Correlated Features> <Slides> |
Week 08 | 24 February | Chapter 14 | [15] <Social Networks> <Slides> [16] <Gaussian Processes> <Slides> |
Week 09 | 03 March | Chapter 15 | [17] <Measurement> <Slides> [18] <Missing Data> <Slides> |
Week 10 | 10 March | Chapters 16 and 17 | [19] <Generalized Linear Madness> <Slides> [20] <Horoscopes> <Slides> |
Important Links
- You can purchase Statistical Rethinking: A Bayesian Course in R and Stan from CRC Press.
-
The
rethinking
package: Statistical Rethinking course and book package: https://github.com/rmcelreath/rethinking. I am using version 2.31. - Statistical rethinking 2023: Course material for January - March 2023. https://github.com/rmcelreath/stat_rethinking_2023. It contains a link to the new Video playlist 2023, and to the slide deck collection. Furthermore it displays a table with the readings per week including the links to the appropriate video and slides. The repo also inlcudes PDFs for the homework and the scripts for the lecture animations. — I could not find the new R scripts associated with the (new) book text. They need to be collected from the slide lectures.
- Statistical rethinking with brms, ggplot2, and the tidyverse: brms/tidyverse-Conversion of Statistical Rethinking using bookdown by A Solomon Kurz (2023-01-26)
- Functions for Learning Bayesian Inference: Maybe I should also check this resource: It is an R package to learn bayesian inference with vignettes as short guides.
Solutions:
There are two different bookdown websites with book solutions:
- Solutions for the 1st edition by Brynjólfur Gauti Jónsson and
- Solutions for the 2nd edition by Jake Thompson.
I have also found two GitHub repos with solutions:
- GitHub solutions by Taras Svirskyi (jffist)
- GitHub solutions by William Wolf (cavaunpeu)
These solutions are written by members of the #RStats community and are not authorized by Richard McElreath, the author of Statistical Rethinking.