Welcome

Andrew Hayes’s (2018) text, Introduction to mediation, moderation, and conditional process analysis: A regression-based approach, has become a staple in social science graduate education. Hayes’s work has been from a frequentist OLS perspective. This book is an effort to connect his work with the Bayesian paradigm. Herein I refit his models with my favorite R (R Core Team, 2022) package for Bayesian regression, Paul Bürkner’s brms (Bürkner, 2017, 2018, 2022f). I also prefer data wrangling and plotting using the tidyverse (Wickham et al., 2019; Wickham, 2022). But to be clear, this ebook is not meant to stand alone. It’s a supplement to the original textbook.

The source code of the ebook is available here. This ebook is powered by the great Yihui Xie’s bookdown package (Xie, 2016, 2022), which makes it easy to turn R markdown files into HTML, PDF, and EPUB. You might also check out Xie, Allaire, and Grolemund’s (2020) R markdown: The definitive guide.

Why this?

I’m not a statistician and I have no formal background in computer science. But I met a great statistics mentor in grad school who was enthusiastic, knowledgeable, and very generous with his time. In one of his stats electives, we used Hayes’s first edition text and I learned a lot in that semester.

Yet a large portion of my training has been out of the classroom, working with messy real-world data, and looking online for help. One of the great resources I happened on was idre, the UCLA Institute for Digital Education. They perform a variety of services, but I benefited the most from was their portfolio of richly annotated textbook examples. Their online tutorials are among the earliest inspirations for this ebook. More so than my old statistics seminar lecture notes, high-quality and freely-available resources like this are where most of my day-to-day data analysis skills come from. We need more resources like this.

Hayes’s work has become influential in many corners of the academy, including my own–psychology. His PROCESS macro (2018) has been aimed at SPSS (IBM Corporation, 2020) and SAS (SAS/IML Software, 2020) users, which is understandable given their popularity in the social sciences. Over the past few years, I’ve moved away from proprietary programs like SPSS to R. R free, open source, and I find it a more flexible and useful tool for data analysis. In fairness, Hayes expanded his second edition to include R code, which is a great move forward. But his work is done from a frequentist OLS perspective and there have been a lot of exciting developments in the world of applied Bayesian statistics. If you’re an R user and want to learn about Bayesian data analysis, I think Bürkner’s brms is the best package around. It’s flexible, uses reasonably-approachable syntax, has sensible defaults, and offers a wide array of post-processing convenience functions. In addition, the R code in Hayes’s second edition does not leverage the power of the tidyverse. The purpose of this ebook is to connect Hayes’s insights into regression with the Bayesian paradigm. We’ll do so within the free and open-source R ecosystem, highlighting the Bayesian brms package, and using functions from the tidyverse and friends (e.g., tidybayes, Kay, 2022) to streamline our code.

My assumptions about you

If you’re looking at this ebook, I’m guessing you’re either a graduate student or a post-graduate academic or researcher of some sort, which comes with the presumption you have at least a 101-level foundation in statistics. If you’re rusty, check out Roback and Legler’s free (2021) text, Beyond multiple linear regression: Applied generalized linear models and multilevel models in R or Navarro’s free (2019) text, Learning statistics with R. I’m also presuming you are at least vaguely familiar with Bayesian statistics. For those totally new to Bayesian regression and HMC estimation, look at Bürkner’s (2017) paper, watch a few of these engaging lectures, or even start with either of my (2023b, 2023c) books based on the two (2020, 2015) editions of McElreath’s Statistical rethinking. I’m also presuming a basic working fluency in R and a vague idea about what the tidyverse is. If you’re totally new to R, consider starting with Peng’s (2022) R programming for data science. And the best introduction to the tidyvese-style of data analysis I’ve found is Grolemund and Wickham’s (2017) R for data science.

That said, you do not need to be totally fluent in statistics or R. Otherwise why would you need this ebook, anyway? IMO, the most important things are curiosity, a willingness to try, and persistent tinkering. I love this stuff. Hopefully you will, too.

How to use and understand this ebook

This ebook is not meant to stand alone. It’s a supplement to the second edition of Hayes’s text. I follow the structure of his text, chapter by chapter, translating his analyses into brms and tidyverse code. However, many of the sections in the text are composed entirely of equations and prose, leaving us nothing to translate. When we run into those sections, the corresponding sections in this ebook will be sparse or blank. I imagine students might reference this ebook as they progress through Hayes’s text. I also imagine working data analysts might use this ebook in conjunction with Hayes’s text as they flip to the specific sections that seem relevant to solving their data challenges.

I reproduce the bulk of the figures in the text, too. The plots in the first few chapters are the closest to those in the text. However, I’m passionate about data visualization and like to play around with color palettes, formatting templates, and other conventions quite a bit. As a result, the plots in each chapter have their own look and feel. I also like to highlight some of the unique strengths Bayesian analyses brings to the table, such as the ease with which you can depict uncertainty with density plots, interval ribbons around regression lines, and spaghetti plots. For more on some of these topics, chapters 3, 7, and 28 in R4DS, Healy’s (2018) Data visualization: A practical introduction, or Wickham’s (2016) ggplot2: Elegant graphics for data analysis.

In this ebook, I use a handful of formatting conventions gleaned from R4DS and R markdown: The definitive guide.

I put R and R packages (e.g., tidybayes) in boldface.
R code blocks and their output appear in a gray background. E.g.,

2 + 2

## [1] 4

Functions are in a typewriter font and followed by parentheses, all atop a gray background (e.g., brm()).
When I want to make explicit what packages a given function comes from, I insert the double-colon operator :: between the package name and the function (e.g., brms::fitted()).
R objects, such as data or function arguments, are in typewriter font atop a gray background (e.g., d or size = 2).
Hyperlinks are denoted by their typical blue-colored font.

R setup

To get the full benefit from this ebook, you’ll need some software. Happily, everything will be free (provided you have access to a decent personal computer and an good internet connection).

First, you’ll need to install R, which you can learn about at https://cran.r-project.org/.

Though not necessary, your R experience might be more enjoyable if done through the free RStudio interface, which you can learn about at https://rstudio.com/products/rstudio/.

Once you have installed R, execute the following to install the bulk of the add-on packages. This will probably take a few minutes to finish. Go make yourself a coffee.

packages <- c("tidyverse", "rstan", "brms", "bayesplot", "broom", "cowplot", "directlabels", "GGally", "ggdark", "ggdist", "ggthemes", "HDInterval", "MASS", "palettetown", "psych")

install.packages(packages, dependencies = T)

A couple of the other packages are not officially available via the Comprehensive R Archive Network (CRAN; https://cran.r-project.org/). You can download them directly from GitHub by executing the following.

devtools::install_github("EdwinTh/dutchmasters")
devtools::install_github("ropenscilabs/ochRe")

It’s possible you’ll have problems installing some of these packages. Here are the likely suspects and where you can find help:

for difficulties installing brms, go to https://github.com/paul-buerkner/brms#how-do-i-install-brms or search around in the brms section of the Stan forums; and
for difficulties installing rstan, go to https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started.

Times change and so do we

For a brief rundown of the version history, we have:

1.0.0.

I released the 1.0.0 version of this book in July of 2018. It was a complete first draft.

1.1.0.

In December 21, 2019, came the 1.1.0 update. It included a lot of changes, the most notable of which were that,

all models were refit with brms 2.10.3,
the introduction to Bayesian information criteria reflected the updates introduced in brms 2.10.0,
I updated the multivariate model code with the mvbind() and mvbf() functions,
I made greater use of the handy tidybayes package for post-processing models,
I updated and simplified a lot of the tidyverse code,
all fit objects were assigned sequential names and were saved in .rda files in the fits folder on GitHub,
I substantially increased the amount of Hayes quotes,
I added more statistical formulas throughout the text,
I added a version of Chapter 14, which includes warnings about relying on cross-sectional mediation, and
the various R objects were assigned simpler and more standardized names.

1.2.0.

In August 8, 2021, came the 1.2.0 update, with the following noteworthy changes:

improving the in-text citations and reference sections with BibTex (BibTeX, 2020), Better BibTeX (Heyns, 2020), and zotero (Roy Rosenzweig Center for History and New Media, 2020);
refitting all models with brms 2.15.0; and
fixing code breaks due to various updates in the tidyverse packages.

1.3.0.

Welcome to version 1.3.0! Noteworthy changes include:

replacing the depreciated posterior_samples() function with the new posterior::as_draws_df()-based workflow;
replacing my incorrect use of tidyr::expand() with a more appropriate tidyr::expand_grid() workflow, thanks to insights from Desislava Petkova;
adopting the new linewidth argument for several ggplot2 geoms (see here);
refitting all models with brms 2.18.0; and
cleaning up references, hyperlinks, and typos along the way.

What’s next?

It’s unclear what future versions of this book may hold. Candidates include

more talk about model priors,
examples of multilevel mediation,
examples of longitudinal mediation, and
sterner warnings against cross-sectional mediation.

We’ll see. In the meantime, feel free to make suggestions or report typos and coding goofs at https://github.com/ASKurz/recoding-Hayes-2018/issues.

Thank-you’s are in order

Before we move on, I’d like to thank the following for their helpful contributions:

Eric Green (@ericpgreen)
Desislava Petkova (@dipetkov)
Jose Storopoli (@storopoli)

For information on how you might help improve this ebook, go to https://github.com/ASKurz/recoding-Hayes-2018/blob/master/CONTRIBUTING.md.

License and citation

This book is licensed under the Creative Commons Zero v1.0 Universal license. You can learn the details, here. In short, you can use my work. Just make sure you give me the appropriate credit the same way you would for any other scholarly resource. Here’s the citation information:

@book{kurzRecodingIntroduction2023,
  title = {Recoding {{Introduction to Mediation, Moderation, and Conditional Process Analysis}}},
  author = {Kurz, A. Solomon},
  year = {2021},
  month = {jan},
  edition = {version 1.3.0},
  url = {https://bookdown.org/content/b472c7b3-ede5-40f0-9677-75c3704c7e5c}
}

References

BibTeX. (2020). http://www.bibtex.org/

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. https://doi.org/10.32614/RJ-2018-017

Bürkner, P.-C. (2022f). brms: Bayesian regression models using ’Stan’. https://CRAN.R-project.org/package=brms

Grolemund, G., & Wickham, H. (2017). R for data science. O’Reilly. https://r4ds.had.co.nz

Hayes, Andrew F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (Second edition). The Guilford Press. https://www.guilford.com/books/Introduction-to-Mediation-Moderation-and-Conditional-Process-Analysis/Andrew-Hayes/9781462534654

Healy, K. (2018). Data visualization: A practical introduction. Princeton University Press. https://socviz.co/

Heyns, E. (2020). Better BibTeX for zotero. https://retorque.re/zotero-better-bibtex/

IBM Corporation. (2020). IBM SPSS Statistics for Windows (Version 27.0) [Computer software]. https://www.ibm.com/products/spss-statistics

Kay, M. (2022). tidybayes: Tidy data and ’geoms’ for Bayesian models. https://CRAN.R-project.org/package=tidybayes

Kurz, A. S. (2023b). Statistical Rethinking with brms, ggplot2, and the tidyverse (version 1.3.0). https://bookdown.org/content/3890/

Kurz, A. S. (2023c). Statistical Rethinking with brms, ggplot2, and the tidyverse: Second Edition (version 0.4.0). https://bookdown.org/content/4857/

McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (Second Edition). CRC Press. https://xcelab.net/rm/statistical-rethinking/

McElreath, R. (2015). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. https://xcelab.net/rm/statistical-rethinking/

Navarro, D. (2019). Learning statistics with R. https://learningstatisticswithr.com

Peng, R. D. (2022). R programming for data science. https://bookdown.org/rdpeng/rprogdatascience/

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Roback, P., & Legler, J. (2021). Beyond multiple linear regression: Applied generalized linear models and multilevel models in R. CRC Press. https://bookdown.org/roback/bookdown-BeyondMLR/

Roy Rosenzweig Center for History and New Media. (2020). Zotero. https://www.zotero.org/

SAS/IML software. (2020). [Computer software]. https://www.sas.com/

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://ggplot2-book.org/

Wickham, H. (2022). tidyverse: Easily install and load the ’tidyverse’. https://CRAN.R-project.org/package=tidyverse

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Xie, Y. (2016). bookdown: Authoring books and technical documents with R markdown. Chapman and Hall/CRC. https://bookdown.org/yihui/bookdown/

Xie, Y. (2022). bookdown: Authoring books and technical documents with R Markdown. https://CRAN.R-project.org/package=bookdown

Xie, Y., Allaire, J. J., & Grolemund, G. (2020). R markdown: The definitive guide. Chapman and Hall/CRC. https://bookdown.org/yihui/rmarkdown/

Recoding Introduction to Mediation, Moderation, and Conditional Process Analysis