Personal Notes: Statistical Rethinking (2nd ed)

Author

Peter Baumgartner

Published

2023-12-25 23:08

Preface

Note

This is work in progress: At the moment I am working on the practice section of chapter 4, e.g., I have finished \(\approx 20\%\) of the book content.

Content and Goals of this Book

This book collects personal notes during reading of Statistical Rethinking by Richard McElreath. I am using the second edition published 2020 by CRC Press an imprint of Routledge of the Taylor & Francis Group. Additionally I am using Statistical Rethinking 2023, the most recent set of free YouTube video lectures.

You can find links to other material on McElreath’s website about the book. Of special interest for me are the brms+tidyverse and the Stan+tidyverse conversion of his code. As I am not very experienced with R and completely new to Bayesian statistics and their tools this additional material is for me also very challenging. I am planning to read them simultaneously (section by section) and will dedicate parallel sections for their approaches. This has the advantage that the section numbers of the files conform to the section numbers of the second edition of the printed book.

WATCH OUT: This is my personal learning material and is therefore not an authoritative textbook!

I wrote this book as a text for others to read because that forces me to be become explicit and explain all my learning outcomes more carefully. Please keep in mind that this text is not written by an expert but by a learner. In spite of replicating most of the content it may contain many mistakes. All these misapprehensions and errors are my responsibility.

Text Passages

My text consists mostly of quotes from the first edition of Harris’ book. I converted my kindle book into a PDF file which I copied via the annotation system in Zotero into my Quarto files.

Example 1 : Quote

“Bayesian inference is really just counting and comparing of possibilities.” (McElreath, 2020, p. 20) (pdf)

@exm-preface-quote has links to my PDF and also to my annotation of the PDF. These links are a practical way for me to get the context of the quote. But as the linked PDF is saved locally at my hard disk these links do not work for you! (There is an option about Zotero groups to share files, but the PDF is not free to use and so I can’t offer this possibility.)

Often I made minor editing (e.g., shorting the text) or put the content in my own wording. In this case I couldn’t quote the text as it does not represent a specific annotation in my Zotero file. In this case I ended the paraphrase with (McElreath ibid.).

In any case most of the text in this Quarto book is not mine but coming from different resources (McElreath’ book or video lectures, Kurz’ website, R help files, packages vignettes, …). Most of the time I have put my own personal notes into a notes box as shown in Example 2.

Example 2 : Personal note

Note 1 : This is a personal note

In this kind of box I will write my personal thoughts and reflections. Usually this box will appear stand-alone (without the wrapping example box).

In any case I am the only responsible person for this text, especially if I have used code from the resources wrongly or misunderstood a quoted text passage.

Sections with the

  • header “Original” refers to the original book
  • header “Tidyverse” refers to the {tidyverse} / {brms} conversion
  • header “Stan” refers to the {rstan} conversion
  • header “Reconsideration” refers to sections with my personal comments.

Code Chunks

Packages {rethinking} and {brms} have similar tasks. Therefore they share a lot of identical function name. Kurz has unloaded the {rethinking} package when it came to explain {brms} function and to prevent name conflicts. But this approach is not efficient for the structure of my documents where I have constantly changed between these two packages. So I have used the advice “Qualifying namespace” from the Google’s R Style Guide.

Whenever I used a function I called the function with the package name in front with the syntax <package name>::<function name>(). Besides preventing conflicts with functions of identical names from different packages it helps to learn (or remember) which function belongs to which package. I think this justifies the small overhead and helps to make R code chunks self-sufficient. (No previous package loading, or library calls in the setup chunk.) To foster learning the relation between function and package I embrace the package name with curly brakes and format it in bold.

To prevent conflicts in chunk names, objects and variables I added the following suffix to the end of the name:

  • suffix a for the original book version
  • suffix b for the {tidyverse} / {brms} version

To distinguish the models I used

  • m.<running.number>a for the original book version
  • m.<running.number>b for the {tidyverse} / {brms}

Example 3 : Name of models

  • The model name m4.3a refers to the third {rethinking} model in the fourth chapter.
  • The model name m2.1b refers to the first {tidyberse}/{brms} model in the second chapter.
  • To refer to graphics, code snippets etc. I have the dot replaced by a dash, for instance #| label: chap04-precis2-m4-1a is the chunk label in the fourth chapter using the second version of the precis summary for model m4.1a.

I am not using the exact code snippets for my replications because I am not only replicating the code to see how it works but also to change the values of parameters to observe their influences.

My focus is on learning Bayesian statistics. Therefore I have not replicated all code snippets from Kurz’ version in case they have no relation to Bayesian statistics but are just graphics explaining general procedures.

This is my first book using Quarto instead of bookdown I am using these notes therefore also to learn Quarto. As a result you will find sometimes remarks or call-out blocks to my Quarto experiences.

Get Code Examples

Go to the book website and download the R code examples for the book.


Listing 1: Download R code for book examples (not evaluated here)

Code
dir.create("R")
download.file("http://xcelab.net/rmpubs/sr2/code.txt", "R/code.R")

The style of the code snippets is not the tidyverse style. For instance: The equal sign = is not embedded between spaces or a list of variables, separated by comas has in front and before the coma a space.

I have converted the original code style with the RStudio addin {styler} package to tidyverse style: Assuming that the default value of the style transformer is styler::tidyverse_style() I selected the code snippet I wanted to convert and called the addin which ran styler:::style_selection(). See Example 4

To facilitate the comparison of {rethinking} and {tidyberse}/{brms} code I have used tabs. This has the disadvantage that one cannot jump directly to links under the tabs. In this case I have linked to the wrapping example and indicated the specific tab where the R code can be found. With graphic it is easier, because if you hover over the links you see the original graphic in a smaller overlay. This is very convenient for comparison of two different graphics (for instance the same graphic with {rethinking} versus {tidyverse} coding). Try it out and hover over Graph 2.

Example 4 : Comparison of code snippets in {rethinking} and {tidyverse} style

R Code 1 a: Code snippet 2.7 (rethinking style)

Code
## R code 2.7 ###############
# analytical calculation
W <- 6
L <- 3
curve( dbeta( x , W+1 , L+1 ) , from=0 , to=1 )
# quadratic approximation
curve( dnorm( x , 0.67 , 0.16 ) , lty=2 , add=TRUE )

Graph 1: Globe tossing data with n = 9 tosses and w = 6 waters. (Produced with rethinking code style)

R Code 2 b: Code snippet 2.7 (tidyverse style)

Code
## R code 2.7 ###############
# analytical calculation
W <- 6
L <- 3
curve(dbeta(x, W + 1, L + 1), from = 0, to = 1)
# quadratic approximation
curve(dnorm(x, 0.67, 0.16), lty = 2, add = TRUE)

Graph 2: Globe tossing data with n = 9 tosses and w = 6 waters. (Produced with tidyverse code style)

To give a better orientation inside RStudio I have R code snippets segmented as in the example above (“## R code 2.7 ##################”). In RStudio one can detect these lines easy as they are displayed as bold headers. This is very helpful for the navigation inside the Quarto file.

As copy & paste from the slides does not work I downloaded the PDF of the Speaker deck slides. But still, it didn’t work always. In that case I used TextSniper and formatted manually. But these copy & paste problems only arise when using new code, prepared for the 3rd edition. With the book (2nd ed.) I do not have problems to copy the code snippets via calibre with the ePUB eBook version.

Package Installation

In contrast to the sparse and partly outdated remarks in the book use the installation section from the rethinking package at GitHub.

Step 1

From the three steps I had already successfully installed the first one (rstan and the C++ toolchain), so I had no need to follow the detailed instructions of the rstan installation at https://mc-stan.org/users/interfaces/rstan.html.

Step 2

To install the cmdstanr package I visited https://mc-stan.org/cmdstanr/. This is an addition to my previous installation with the older version (2nd ed., 2022). As I installed the latest beta version of cmdstanr the first time I also needed to compile the libraries with cmdstanr::install_cmdstan().

To check the result of my installation I ran check_cmdstan_toolchain().


Listing 2: Install the cmdstanr package (not evaluated here)

Code
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
cmdstanr::install_cmdstan()
cmdstanr::check_cmdstan_toolchain()

The command for downloaded cmdstanr did not install the vignettes, which take a long time to build, but they are always available online at https://mc-stan.org/cmdstanr/articles/.

The vignette Getting started with CmdStanR also recommend to load the bayesplot and posterior packages, which are used later in the CmdStanR-examples. But I believe these two packages are not necessary if you just plan to stick with the book.

Step 3

Once the infrastructure is installed one can install the packages used by the book. With the exception of rethinking — the companion package of the book – they can all be downloaded from CRAN.

I had already devtools installed, therefore I deleted it from the list of installed packages.


Listing 3: Install packages (not evaluated here)

Code
install.packages(c("coda","mvtnorm", "loo","dagitty","shape"))
devtools::install_github("rmcelreath/rethinking")

Solutions

There are several websites with book solutions. They have different quality and not always exhaustive. For the purpose of comparison I have consulted mostly the following two collection of solutions:

I have also found two GitHub repos with solutions. The result of these solutions are not accessible online. One has to fork these repos and compile them to see the results.

WATCH OUT! Solutions are not authorized by the book author

These solutions are written by members of the #RStats community and are not authorized by Richard McElreath, the author of Statistical Rethinking.

Help appreciated!

If you find errors in this Quarto book or want to add some comment please do not hesitate to write issues or PRs on my GitHub site. I really appreciate it to learn from more experienced R users! It shortens the learning paths of self-directed learners.

Course Schedule

The following tables matches the lectures (videos 2023 and slides 2023) with the book chapters of the second edition (2020). It was generated by a screenshot from Statistical Rethinking 2023 - 01 - The Golem of Prague (50:09), but can also be found as a slide in Statistical Rethinking 2023 - Lecture 01.

A better overview with links to videos and slides provides the following HTML table, taken from the README.md file for the 2023 lectures.

Links to videos and slides

Week ## Meeting date Reading Lectures
Week 01 06 January Chapters 1, 2 and 3 [1] <Golem of Prague> <Slides>
[2] <Garden of Forking Data> <Slides>
Week 02 13 January Chapter 4 [3] <Geocentric Models> <Slides>
[4] <Categories and Curves> <Slides>
Week 03 20 January Chapters 5 and 6 [5] <Elemental Confounds> <Slides>
[6] <Good and Bad Controls> <Slides>
Week 04 27 January Chapters 7,8,9 [7] <Overfitting> <Slides>
[8] <MCMC> <Slides>
Week 05 03 February Chapters 10 and 11 [9] <Modeling Events> <Slides>
[10] <Counts and Confounds> <Slides>
Week 06 10 February Chapters 11 and 12 [11] <Ordered Categories> <Slides>
[12] <Multilevel Models> <Slides>
Week 07 17 February Chapter 13 [13] <Multilevel Adventures> <Slides>
[14] <Correlated Features> <Slides>
Week 08 24 February Chapter 14 [15] <Social Networks> <Slides>
[16] <Gaussian Processes> <Slides>
Week 09 03 March Chapter 15 [17] <Measurement> <Slides>
[18] <Missing Data> <Slides>
Week 10 10 March Chapters 16 and 17 [19] <Generalized Linear Madness> <Slides>
[20] <Horoscopes> <Slides>