Andrew Hayes’s Introduction to Mediation, Moderation, and Conditional Process Analysis text, the second edition of which just came out, has become a staple in social science graduate education. Both editions of his text have been from a frequentist OLS perspective. This project is an effort to connect his work with the Bayesian paradigm. Herein I refit his models with my favorite R package for Bayesian regression, Bürkner’s brms. I also prefer data wrangling and plotting using the tidyverse, which you might learn about here or here). But to be clear, this project is not meant to stand alone. It’s a supplement to the textbook.

The source code of the project is available here. This project is powered by the great Yihui Xie’s bookdown package which makes it easy to turn R markdown files into HTML, PDF, and EPUB. You can learn more here. While you’re at it, also check out this great guide to R Markdown.

Why this?

I’m not a statistician and I have no formal background in computer science. But I met a great statistics mentor in grad school who was enthusiastic, knowledgeable, and very generous with his time. In one of his stats electives, we used Hayes’s first edition text and I learned a lot in that semester.

Yet a large portion of my training has been out of the classroom, working with messy real-world data, and looking online for help. One of the great resources I happened on was idre, the UCLA Institute for Digital Education. They perform a variety of services, but I benefited the most from was their portfolio of richly annotated textbook examples. Their online tutorials are among the earliest inspirations for this project. More so than my old statistics seminar lecture notes, high-quality and freely-available resources like this are where most of my day-to-day data analysis skills come from. We need more resources like this.

Hayes’s work has become influential in many corners of the academy, including my own–psychology. His PROCESS macro has been aimed at SPSS and SAS users, which is understandable given their popularity in the social sciences. Over the past few years, I’ve moved away from proprietary programs like SPSS to R. R free, open source, and I find it a more flexible and useful tool for data analysis. In fairness, Hayes expanded his second edition to include R code, which is a great move forward. But his work is done from a frequentist OLS perspective and there have been a lot of exciting developments in the world of applied Bayesian statistics. If you’re an R user and want to learn about Bayesian data analysis, I think Bürkner’s brms is the best package around. It’s flexible, uses reasonably-approachable syntax, has sensible defaults, and offers a wide array of post-processing convenience functions. In addition, the R code in Hayes’s second edition does not leverage the power of the tidyverse. The purpose of this project is to connect Hayes’s insights into regression with the Bayesian paradigm. We’ll do so within the free and open-source R ecosystem, highlighting the Bayesian brms package, and using functions from the tidyverse and friends (e.g., tidybayes) to streamline our code.

My assumptions about you

If you’re looking at this project, I’m guessing you’re either a graduate student or a post-graduate academic or researcher of some sort, which comes with the presumption you have at least a 101-level foundation in statistics. If you’re rusty, check out Legler and Roback’s free bookdown text, Broadening Your Statistical Horizons or Navarro’s free Learning Statistics with R. I’m also presuming you are at least vaguely familiar with Bayesian statistics. For those totally new to Bayesian regression and HMC estimation, look at this paper, watch a few of these engaging lectures, or even start with my other project based on this excellent text. I’m also presuming a basic working fluency in R and a vague idea about what the tidyverse is. If you’re totally new to R, consider starting with Peng’s R Programming for Data Science. And the best introduction to the tidyvese-style of data analysis I’ve found is Grolemund and Wickham’s R for Data Science.

That said, you do not need to be totally fluent in statistics or R. Otherwise why would you need this project, anyway? IMO, the most important things are curiosity, a willingness to try, and persistent tinkering. I love this stuff. Hopefully you will, too.

How to use and understand this project

This project is not meant to stand alone. It’s a supplement to the second edition of Hayes’s text. I follow the structure of his text, chapter by chapter, translating his analyses into brms and tidyverse code. However, many of the sections in the text are composed entirely of equations and prose, leaving us nothing to translate. When we run into those sections, the corresponding sections in this project will be sparse or blank. I imagine students might reference this project as they progress through Hayes’s text. I also imagine working data analysts might use this project in conjunction with Hayes’s text as they flip to the specific sections that seem relevant to solving their data challenges.

I reproduce the bulk of the figures in the text, too. The plots in the first few chapters are the closest to those in the text. However, I’m passionate about data visualization and like to play around with color palettes, formatting templates, and other conventions quite a bit. As a result, the plots in each chapter have their own look and feel. I also like to highlight some of the unique strengths Bayesian analyses brings to the table, such as the ease with which you can depict uncertainty with density plots, interval ribbons around regression lines, and spaghetti plots. For more on some of these topics, chapters 3, 7, and 28 in R4DS, Healy’s Data Visualization: A practical introduction, or Wickham’s ggplot2: Elegant Graphics for Data Analysis.

In this project, I use a handful of formatting conventions gleaned from R4DS and R Markdown: The Definitive Guide.

  • I put R and R packages (e.g., tidybayes) in boldface.
  • R code blocks and their output appear in a gray background. E.g.,
## [1] 4
  • Functions are in a typewriter font and followed by parentheses, all atop a gray background (e.g., brm()).
  • When I want to make explicit what packages a given function comes from, I insert the double-colon operator :: between the package name and the function (e.g., tidyr::gather()).
  • R objects, such as data or function arguments, are in typewriter font atop a gray background (e.g., d or size = 2).
  • Hyperlinks are denoted by their typical blue-colored font.

Times change and so do we

I released the 1.0.0 version of this book in July of 2018. This 1.1.0 version includes a lot of changes. Most notably,

  • all models have been refit with brms 2.10.3,
  • the introduction to Bayesian information criteria now reflects the updates introduced in brms 2.10.0,
  • I have updated the multivariate model code with the mvbind() and mvbf() functions,
  • I have made greater use of the handy tidybayes package for post-processing models,
  • I have updated and simplified a lot of the tidyverse code,
  • all fit objects have sequential names and are now saved in .rda files in the fits folder on GitHub,
  • I have substantially increased the amount of Hayes quotes,
  • I added more statistical formulas throughout the text,
  • I added a version of Chapter 14, which includes warnings about relying on cross-sectional mediation, and
  • the various R objects enjoy simpler and more standardized names.

It’s unclear what future versions of this book may hold. Candidates include

  • more talk about model priors,
  • examples of multilevel mediation,
  • examples of longitudinal mediation, and
  • sterner warnings against cross-sectional mediation.

We’ll see. In the meantime, feel free to make suggestions or report typos and coding goofs here.