Statistical Rethinking with brms, ggplot2, and the tidyverse
A Solomon Kurz
This is a love letter
I love McElreath’s Statistical Rethinking text. It’s the entry-level textbook for applied researchers I spent a couple years looking for. McElreath’s freely-available lectures on the book are really great, too.
However, I’ve come to prefer using Bürkner’s brms package when doing Bayeisn regression in R. It’s just spectacular. I also prefer plotting with Wickham’s ggplot2, and recently converted to using tidyverse-style syntax (which you might learn about here or here).
So, this project is an attempt to reexpress the code in McElreath’s textbook. His models are re-fit in brms, plots are reproduced or reimagined with ggplot2, and the general data wrangling code now predominantly follows the tidyverse style.
I’m not a statistician and I have no formal background in computer science. Though I benefited from a sweet of statistics courses in grad school, a large portion of my training has been outside of the classroom, working with messy real-world data, and searching online for help. One of the great resources I happened on was idre, the UCLA Institute for Digital Education, which offers an online portfolio of richly annotated textbook examples. Their online tutorials are among the earliest inspirations for this project. We need more resources like them.
With that in mind, one of the strengths of McElreath’s text is its thorough integration with the rethinking package. The rethinking package is a part of the R ecosystem, which is great because R is free and open source. And McElreath has made the source code for rethinking publically available, too. Since he completed his text, many other packages have been developed to help users of the R ecosystem interface with Stan. Of those alternative packages, I think Bürkner’s brms package is the best for general-purpose Bayesian data analysis. It’s flexible, uses reasonably-approachable syntax, has sensible defaults, and offers a vast array of post-processing convenience functions. And brms has only gotten better over time. To my knowledge, there are no textbooks on the market that highlight the brms package, which seems like an evil worth correcting.
In addition, McElreath’s data wrangling code is based in the base R style and he made most of his figures with base R plots. Though there are benefits to sticking close to base R functions (e.g., less dependencies leading to a lower likelihood that your code will break in the future), there are downsides. For beginners, base R functions can be difficult both to learn and to read. Happily, in recent years Hadley Wickham and others have been developing a group of packages collectively called the tidyverse. The tidyverse packages (e.g., dplyr, tidyr, purrr) were developed according to an underlying philosophy and they are designed to work together coherently and seamlessly. Though not all within the R community share this opinion, I am among those who think the tydyverse style of coding is generally easier to learn and sufficiently powerful that these packages can accommodate the bulk of your data needs. I also find tydyverse-style syntax easier to read. And of course, the widely-used ggplot2 package is part of the tidyverse, too.
To be clear, students can get a great education in both Bayesian statistics and programming in R with McElreath’s text just the way it is. Just go slow, work through all the examples, and read the text closely. It’s a pedagogical boon. I could not have done better or even closely so. But what I can offer is a parallel introduction on how to fit the statistical models with the ever-improving and already-quite-impressive brms package. I can throw in examples of how to perform other operations according to the ethic of the tidyverse. And I can also offer glimpses of some of the other great packages in the R + Stan ecosystem (e.g., loo, bayesplot, and tidybayes).
My assumptions about you
If you’re looking at this project, I’m guessing you’re either a graduate student, a post-graduate academic, or a researcher of some sort. So I’m presuming you have at least a 101-level foundation in statistics. If you’re rusty, consider checking out Legler and Roback’s free bookdown text, Broadening Your Statistical Horizons before diving into Statistical Rethinking. I’m also assuming you understand the rudiments of R and have at least a vague idea about what the tidyverse is. If you’re totally new to R, consider starting with Peng’s R Programming for Data Science. And the best introduction to the tidyvese-style of data analysis I’ve found is Grolemund and Wickham’s R for Data Science, which I extensively appeal to throughout this project.
That said, you do not need to be totally fluent in statistics or R. Otherwise why would you need this project, anyway? IMO, the most important things are curiosity, a willingness to try, and persistent tinkering. I love this stuff. Hopefully you will, too.
How to use and understand this project
This project is not meant to stand alone. It’s a supplement to McElreath’s Statistical Rethinking text. I follow the structure of his text, chapter by chapter, translating his analyses into brms and tidyverse code. However, some of the sections in the text are composed entirely of equations and prose, leaving us nothing to translate. When we run into those sections, the corresponding sections in this project will sometimes be blank or omitted, though I do highlight some of the important points in quotes and prose of my own. So I imagine students might reference this project as they progress through McElreath’s text. I also imagine working data analysts might use this project in conjunction with the text as they flip to the specific sections that seem relevant to solving their data challenges.
I reproduce the bulk of the figures in the text, too. The plots in the first few chapters are the closest to those in the text. However, I’m passionate about data visualization and like to play around with color palettes, formatting templates, and other conventions quite a bit. As a result, the plots in each chapter have their own look and feel. For more on some of these topics, check out chapters 3, 7, and 28 in R4DS or Healy’s Data Visualization: A practical introduction.
- R code blocks and their output appear in a gray background. E.g.,
2 + 2 == 5
##  FALSE
- Functions are in a typewriter font and followed by parentheses, all atop a gray background (e.g.,
- When I want to make explicit what packages a given function came from, I insert the double-colon operator
::between the package name and the function (e.g.,
- R objects, such as data or function arguments, are in typewriter font atop gray backgrounds (e.g.,
.width = .5).
- You can detect hyperlinks by their typical blue-colored font.
- In the text, McElreath indexed his models with names like
m4.1(i.e., the first model of Chapter 4). I primarily followed that convention, but replaced the
bto stand for the brms package.
You can do this, too
This project is powered by Yihui Xie’s bookdown package, which makes it easy to turn R markdown files into HTML, PDF, and EPUB. Go here to learn more about bookdown. While you’re at it, also check out Xie, Allaire, and Grolemund’s R Markdown: The Definitive Guide. And if you’re unacquainted with GitHub, check out Jenny Bryan’s Happy Git and GitHub for the useR.
The source code of the project is available here.