2.1 Example applications
Now you have learned the very basic concepts of R Markdown. The idea should be simple enough: interweave narratives with code in a document, knit the document to dynamically generate results from the code, and you will get a report. This idea was not invented by R Markdown, but came from an early programming paradigm called “Literate Programming” (Knuth 1984).
Due to the simplicity of Markdown and the powerful R language for data analysis, R Markdown has been widely used in many areas. Before we dive into the technical details, we want to show some examples to give you an idea of its possible applications.
2.1.1 Airbnb’s knowledge repository
Airbnb uses R Markdown to document all their analyses in R, so they can combine code and data visualizations in a single report (Bion, Chang, and Goodman 2018). Eventually all reports are carefully peer-reviewed and published to a company knowledge repository, so that anyone in the company can easily find analyses relevant to their team. Data scientists are also able to learn as much as they want from previous work or reuse the code written by previous authors, because the full R Markdown source is available in the repository.
2.1.2 Homework assignments on RPubs
A huge number of homework assignments have been published to the website https://RPubs.com (a free publishing platform provided by RStudio), which shows that R Markdown is easy and convenient enough for students to do their homework assignments (see Figure 2.3). When I was still a student, I did most of my homework assignments using Sweave, which was a much earlier implementation of literate programming based on the S language (later R) and LaTeX. I was aware of the importance of reproducible research but did not enjoy LaTeX, and few of my classmates wanted to use Sweave. Right after I graduated, R Markdown was born, and it has been great to see so many students do their homework in the reproducible manner.
In a 2016 JSM (Joint Statistical Meetings) talk, I proposed that course instructors could sometimes intentionally insert some wrong values in the source data before providing it to the students for them to analyze the data in the homework, then correct these values the next time, and ask them to do the analysis again. This way, students should be able to realize the problems with the traditional cut-and-paste approach for data analysis (i.e., run the analysis separately and copy the results manually), and the advantage of using R Markdown to automatically generate the report.
2.1.3 Personalized mail
One thing you should remember about R Markdown is that you can programmatically generate reports, although most of the time you may be just clicking the
Knit button in RStudio to generate a single report from a single source document. Being able to program reports is a super power of R Markdown.
Mine Çetinkaya-Rundel once wanted to create personalized handouts for her workshop participants. She used a template R Markdown file, and knitted it in a for-loop to generate 20 PDF files for the 20 participants. Each PDF contained both personalized information and common information. You may read the article https://rmarkdown.rstudio.com/articles_mail_merge.html for the technical details.
2.1.4 2017 Employer Health Benefits Survey
The 2017 Employer Health Benefits Survey was designed and analyzed by the Kaiser Family Foundation, NORC at the University of Chicago, and Health Research & Educational Trust. The full PDF report was written in R Markdown (with the bookdown package). It has a unique appearance, which was made possible by heavy customizations in the LaTeX template. This example shows you that if you really care about typesetting, you are free to apply your knowledge about LaTeX to create highly sophisticated reports from R Markdown.
2.1.5 Journal articles
Chris Hartgerink explained how and why he used R Markdown to write dynamic research documents in the post at https://elifesciences.org/labs/cad57bcf/composing-reproducible-manuscripts-using-r-markdown. He published a paper titled “Too Good to be False: Nonsignificant Results Revisited” with two co-authors (Hartgerink, Wicherts, and Assen 2017). The manuscript was written in R Markdown, and results were dynamically generated from the code in R Markdown.
When checking the accuracy of P-values in the psychology literature, his colleagues and he found that P-values could be mistyped or miscalculated, which could lead to inaccurate or even wrong conclusions. If the P-values were dynamically generated and inserted instead of being manually copied from statistical programs, the chance for those problems to exist would be much lower.
Lowndes et al. (2017) also shows that using R Markdown (and version control) not only enhances reproducibility, but also produces better scientific research in less time.
2.1.6 Dashboards at eelloo
R Markdown is used at eelloo (https://eelloo.nl) to design and generate research reports. Here is one of their examples (in Dutch): https://eelloo.nl/groepsrapportages-met-infographics/, where you can find gauges, bar charts, pie charts, wordclouds, and other types of graphs dynamically generated and embedded in dashboards.
We will introduce the R Markdown extension bookdown in Chapter 12. It is an R package that allows you to write books and long-form reports with multiple Rmd files. After this package was published, a large number of books have emerged. You can find a subset of them at https://bookdown.org. Some of these books have been printed, and some only have free online versions.
There have also been students who wrote their dissertations/theses with bookdown, such as Ed Berry: https://eddjberry.netlify.com/post/writing-your-thesis-with-bookdown/. Chester Ismay has even provided an R package thesisdown (https://github.com/ismayc/thesisdown) that can render a thesis in various formats. Several other people have customized this package for their own institutions, such as Zhian N. Kamvar’s beaverdown (https://github.com/zkamvar/beaverdown) and Ben Marwick’s huskydown (https://github.com/benmarwick/huskydown).
The blogdown package to be introduced in Chapter 10 can be used to build general-purpose websites (including blogs and personal websites) based on R Markdown. You may find tons of examples at https://github.com/rbind or by searching on Twitter: https://twitter.com/search?q=blogdown. Here are a few impressive websites that I can quickly think of off the top of my head:
Rob J Hyndman’s personal website: https://robjhyndman.com (a very comprehensive academic website).
Amber Thomas’s personal website: https://amber.rbind.io (a rich project portfolio).
Emi Tanaka’s personal website: https://emitanaka.github.io (in particular, check out the beautiful showcase page).
“Live Free or Dichotomize” by Nick Strayer and Lucy D’Agostino McGowan: http://livefreeordichotomize.com (the layout is elegant, and the posts are useful and practical).
Knuth, Donald E. 1984. “Literate Programming.” The Computer Journal 27 (2). British Computer Society: 97–111.
Bion, Ricardo, Robert Chang, and Jason Goodman. 2018. “How R Helps Airbnb Make the Most of Its Data.” The American Statistician 72 (1). Taylor & Francis: 46–52. https://doi.org/10.1080/00031305.2017.1392362.
Hartgerink, Chris HJ, Jelte M Wicherts, and Marcel ALM van Assen. 2017. “Too Good to Be False: Nonsignificant Results Revisited.” Collabra: Psychology 3 (1). The Regents of the University of California.
Lowndes, Julia S Stewart, Benjamin D Best, Courtney Scarborough, Jamie C Afflerbach, Melanie R Frazier, Casey C O’Hara, Ning Jiang, and Benjamin S Halpern. 2017. “Our Path to Better Science in Less Time Using Open Data Science Tools.” Nature Ecology & Evolution 1 (6). Nature Publishing Group.