Doing Bayesian Data Analysis in brms and the tidyverse
What and why
Kruschke began his text with “This book explains how to actually do Bayesian data analysis, by real people (like you), for realistic data (like yours).” In the same way, this project is designed to help those real people do Bayesian data analysis. My contribution is converting Kruschke’s JAGS and Stan code for use in Bürkner’s brms package (Bürkner, 2017, 2018, 2020a), which makes it easier to fit Bayesian regression models in R (R Core Team, 2020) using Hamiltonian Monte Carlo. I also prefer plotting and data wrangling with the packages from the tidyverse (Wickham, 2019b; Wickham, Averick, et al., 2019). So we’ll be using those methods, too.
This project is not meant to stand alone. It’s a supplement to the second edition of Kruschke’s (2015) Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Please give the source material some love.
We have updates
For a brief rundown of the version history, we have:
I released the 0.1.0 version of this project in February 17, 2020. It was the first [fairly] complete draft including material from all the chapters in Kruschke’s text. The supermajority of Kruschke’s JAGS and Stan models were fit brms 2.11.5. The results were saved in the
fits folder on GitHub and most of the results are quite comparable to those in the original text. We also reproduced most of the data-related figures and tables and little subpoints and examples sprinkled throughout Kruschke’s prose.
The 0.2.0 update came in May 19, 2020. Noteworthy changes included:
- reproducing the simulation necessary for Figure 7.3 (see GitHub issue #14) with help from Cardy Moten III (@cmoten);
- with guidance from Bjørn Peare Bartholdy (@bbartholdy), Mladen Jovanović (@mladenjovanovic), Cory Whitney (@CWWhitney), and Brenton M. Wiernik (@bwiernik), we improved in-text citations and reference sections using BibTex (BibTeX, 2020), Better BibTeX (Better BibTeX for Zotero, 2020), and zotero (Zotero | Your Personal Research Assistant, 2020);
- the plot resolution increased with
fig.retina = 2.5; and
- small code, hyperlink, and typo corrections.
Welcome to version 0.3.0! Noteworthy changes include:
- adding the Kruschke-style model diagrams throughout the text (e.g., Figure 8.5);
- adding chapter-specific plotting schemes with help from the cowplot package (C. O. Wilke, 2020a), Wilke’s (2019a) Fundamentals of data visualization, and many other great color-scheme packages;
- an overhaul to the plotting workflow in Section 6.4.1; and
- updating all model fits to the current version of brms (2.13.5).
We’re not done yet and I could use your help.
There are some minor improvements I’d like to add in future versions. Most importantly, I’d like to patch up the content holes. A few simulations, figures, and models are beyond my current skill set. I’ve opened separate GitHub issues for the most important ones and they are as follows:
- the effective-sample-size simulations in Section 7.5.2 and the corresponding plots in Figures 7.13 and 7.14 (issue #15),
- several of the simulations in Sections 11.1.4, 11.3.1, and 11.3.2 and their corresponding figures (issues #16, #17, #18, and #19),
- the stopping-rule simulations in Section 13.3.2 and their corresponding figures (issue #20),
- the data necessary to properly reproduce the HMC proposal schematic presented in Section 14.1 and Figures 14.1 through 14.3 (issue #21), and
- the conditional logistic models of Section 22.214.171.124 (issue #22).
If you know how to conquer any of these unresolved challenges, I’d love to hear all about it. In addition, please feel free to open a new issue if you find any flaws in the other sections of the project.
Thank-you’s are in order
Before we enter the primary text, I’d like to thank the following for their helpful contributions:
- Bjørn Peare Bartholdy (@bbartholdy),
- Paul-Christian Bürkner (@paul-buerkner),
- Andrew Gelman (@andrewgelman),
- Mladen Jovanović (@mladenjovanovic),
- Matthew Kay (@mjskay),
- TJ Mahr (@tjmahr),
- Cardy Moten III (@cmoten),
- Lukas Neugebauer (@LukasNeugebauer),
- Demetri Pananos (@Dpananos),
- Aki Vehtari (@avehtari),
- Matti Vuorre (@mvuorre),
- Cory Whitney (@CWWhitney), and
- Brenton M. Wiernik (@bwiernik).
Better BibTeX for zotero :: Better BibTeX for zotero. (2020). https://retorque.re/zotero-better-bibtex/
BibTeX. (2020). http://www.bibtex.org/
Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. https://doi.org/10.32614/RJ-2018-017
Bürkner, P.-C. (2020a). brms: Bayesian regression models using ’Stan’. https://CRAN.R-project.org/package=brms
Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press. https://sites.google.com/site/doingbayesiandataanalysis/
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H. (2019b). tidyverse: Easily install and load the ’tidyverse’. https://CRAN.R-project.org/package=tidyverse
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wilke, C. O. (2019a). Fundamentals of data visualization. https://clauswilke.com/dataviz/
Wilke, C. O. (2020a). cowplot: Streamlined plot theme and plot annotations for ’ggplot2’ [Manual]. https://wilkelab.org/cowplot
Zotero | Your personal research assistant. (2020). https://www.zotero.org/