F.3 Essential R Markdown

The range of options available in R Markdown is paradoxical: For beginners, the wealth of commands in the RStudio Cheatsheet on R Markdown may seem overwhelming. At the same time, more experienced users often complain about the lack of options in R Markdown, especially when comparing it to type-setting systems like LaTeX. R Markdown is a serious attempt at hitting the sweet spot between simplicity and complexity: It aims to provide a limited set of features that are good enough for most purposes, but leaves more exotic tasks to more specialized systems.

Fortunately, the range of commands needed to benefit from R Markdown is very limited. Any R Markdown document consists of three parts:

  1. A header for setting global document options,

  2. Text that may contain headings, paragraphs, and itemized lists, and

  3. Code chunks that contain and evaluate R code.

The small set of commands (introduced in the following sections) will allow you to create fancy data science reports that will impress your instructors and employers long before you reach the end of R Markdown.

F.3.1 Header

The header of a new .Rmd document can contain many technical things, but should primarily contain the following four elements:

  1. The title of your report,

  2. the name of its author(s),

  3. the current date,

  4. the output format of the document.

---
title: "My fancy report"
author: "Author name"
date: "2022 July 15"
output: html_document
---

Different options for output formats exist, but .html tends to be the most convenient and portable one.

F.3.2 Text

The body of an .Rmd document is essentially a plain text document that accepts R Markup commands to structure and format text elements. R Markdown is similar to many other markup languages (e.g., HTML, XML, and TeX) by using special symbols to signal formatting commands. While using HTML or TeX can require very complicated commands and tags, the basic idea of Markdown is to provide a simple and limited set of commands that are satisfy the most common formatting needs. Good examples for simple solutions that are intuitive and work well is how R Markdown structures headings, signals emphasis, and allows creating itemized lists.

Headings

Headings indicate the structure of your document by separating it into multiple sections. An unfortunate coincidence that may confuse you for a minute is that R Markdown uses the symbol # (i.e., the hash or pound symbol) as a prefix to section titles. Obviously, this differs from the use of the symbol # in R code, where it is used as a prefix for commenting out lines.

A main (or 1st level) heading entitled Literature Review would be indicated by typing # Literature Review (and capitalized in APA format). Lower-level subsections (and sub-sub-sections, etc.) can then be created by using two (three, or more) # symbols:

# 1st Level Heading

## 2nd level heading

### 3rd level heading

It is important to remain consistent about the number of levels within a document. Introducing a lower level of subsections typically only makes sense when there are at least two headings on that level.

Formatting text

Text formatting — like typing words in italics or bold — is easy as well.

Text formatting --- like typing words in _italics_ or **bold** --- is easy as well. 

Lists

A neat feature of R Markdown is that it makes creating itemized (bullet-point or enumerated) lists very easy.

Bullet-point lists

To create a bullet-point list, simply prefix lines of text by the dash - (or asterix *) symbol. For instance, typing:

- First point
- Second point
    - Subpoint A 
    - Subpoint B
- Third point 

will show up in the output file as:

  • First point
  • Second point
    • Subpoint A
    • Subpoint B
  • Third point

Provided that items on the 2nd level are preceded by at least 4 spaces, they are automatically distinguished from the items on the 1st level.

Enumerated lists

To create an enumerated-list, simply prefix lines of text by n. (with n being a number). For instance, typing

1. First point
1. Second point
    1. Subpoint A 
    1. Subpoint B
1. Third point 

will show up in the output file as:

  1. First point
  2. Second point
    1. Subpoint A
    2. Subpoint B
  3. Third point

Again, sub-items are distinguished from top-level items by indenting them (by at least 4 spaces). Note that we were lazy and used 1. on every line, but R Markdown was smart enough to count our items and sub-items.

Preventing and enforcing line and paragraph breaks

A subtle but important feature of R Markdown is its use of spaces (i.e., " ") and blank lines (i.e., lines that only contain spaces and/or a line break, typically typed by hitting Enter). Four concepts to distinguish here are non-breaking spaces, line breaks, paragraph breaks, and manual line breaks:89

  1. A non-breaking space is a space within two parts of an expression or name that should never be split into separate lines of text. Although many readers and writers are ignorant of their existence, these spaces actually make sense quite often. For instance, we usually would not want to separate numbers from their units (e.g., in “10 km” or “$ 100”), break up proper names (e.g., “R Markdown” or “Dr. Seuss”), references to enumerated figures and tables (e.g., “Figure 1” and “Table 2”), or names with titles (like “Dr. Seuss”). To insert a non-breaking space in R Markdown, we can either escape a space character (by preceding it with a backslash, as in Dr.\ Seuss) or use the HTML command   (as in Figure 1).

  2. A regular line break in the .Rmd input file appears as a space in the output document. Thus, R Markdown — like many other markup languages — interprets line breaks as spaces, rather than as the beginning of a new paragraph.90

  3. A paragraph break separates paragraphs, rather than just lines of text within the same paragraph. In HTML outputs, paragraph breaks are often shown as empty lines, whereas most printed articles and books only signal a new paragraph by starting a new line and indenting the first words of the new paragraph. To indicate a paragraph break in R Markdown, we need to insert one or more empty lines between strings of text. When getting used to this, inserting empty lines between different parts (e.g., between headings, lines of text, and code chunks) is a convenient and useful way to structure a document.91

  4. Enforcing manual line breaks: An occasionally confusing feature of R Markdown is that ending a line with two or more spaces forces a manual line break. Thus, typing three words on three lines, like:

ene 
mene 
miste 

could either be rendered as:

ene mene miste

or as:

ene
mene
miste

in the output file, depending on the number of (invisible) spaces behind each line. (Specifically, the three words are rendered in the same line of text if there is zero or one space after each word, and into separate lines of text when there are two or more spaces after each word.)

It is annoying that — due to the invisible nature of spaces — this difference is not visible in the source document. Alas, even R Markdown is not perfect. Fortunately, human beings can adapt to various circumstances and constraints. Thus, just remember that (a) an escaped space prevents a line break, (b) different paragraphs should be separated by one or more blank lines, and (c) two or more spaces enforce a line break.

F.3.3 Code chunks

All commands mentioned in this section so far were Markdown commands that provide a nifty notepad, but have not yet required any R code. The real fun starts when mixing text with code — which is where R Markdown comes into play.

Creating chunks

To signal the switch from text to code, insert a code chunk by typing the chunk delimiting symbols ```{r} (to start a chunk) and ``` (to end it). In RStudio, using the keyboard shortcut Cmd + Alt + I immediately yields a new empty chunk that accepts R code:

```{r}  
     
```  

Everything that is contained within this chunk works exactly like an R script. This means that — within a chunk or a sequence of chunks — you can define objects, provide comments (now using # as the comment symbol again), and be evaluated line by line, just in any ordinary R script. When knitting the document, all code chunks are evaluated and the results are displayed (unless you select chunk options that prevent this).

Importantly, any code chunk needs both a beginning (```{r}) and an end (```). RStudio typically shows any text on a white background and code chunks on a grey background. If this changes anywhere in your document, chances are that you opened but forgot to close a chunk. To avoid this error, get into the habit of always writing both parts when creating a new chunk.

Chunk options

Chunks can be named and their default behavior can be changed by setting many options. To get started, it makes sense to provide a unique name for each chunk (e.g., to facilitate navigation in large documents and obtain more informative error messages when making a mistake) and to restrict one’s use of chunk options to:

  • echo: Show the code chunk in the output?
    The default setting is echo = TRUE, but we can use echo = FALSE to hide the code in the output document.

  • eval: Evaluate the code chunk when creating the output?
    The default setting is eval = TRUE, but we can use eval = FALSE to prevent R from evaluating the chunk (e.g., when this would yield an error). Whenever a .Rmd file fails to compile, setting eval = FALSE to the corresponding chunks is a popular debugging option. However, keep in mind that anything that is not evaluated cannot be used later. (Specifically, any objects assigned or variables created in code that it no longer evaluated cannot be used later, even if they may exist in your local environment.)

For instance, creating a new chunk:

```{r plot_cars, echo = TRUE, eval = FALSE}   
 plot(cars,   
      xlab = "Speed (mph)",    
      ylab = "Stopping distance (ft)")   
```   

will show the chunk plot_cars in the output document (due to the option echo = TRUE), but not create or show the corresponding plot (due to eval = FALSE).

  • message and warning:

Some chunks (e.g., an initial one loading required packages like those of the tidyverse) may create messages or warnings that you may wish to hide in your output document. Using the chunk options message and warning to FALSE lets you accomplish this:

```{r load_pkg, message = FALSE, warning = FALSE}
library(tidyverse)
``` 

The full list of chunk options is long (e.g., see http://yihui.name/knitr/options/), but most people (including many users of R Markdown) live quite happily without ever using them.

Inline chunks

A second way of evaluating R code in R Markdown is to directly embed it into the text by `r `.

For instance,

  • `r v <- 1:3; sum(v)` evaluates to 6 in the output document, and

  • `r nrow(cars)` determines that the cars dataset contains 50 rows.

Inline chunks typically contain very brief R commands, but can be immensely useful for characterizing datasets or mentioning the values of results (e.g., means, SD, or \(p\) values computed in an analysis).

F.3.4 Advanced features

R Markdown supports a range of more sophisticated features. But as many of those are not needed at first, we only mention some common ones:

  1. Images can be included by ![Image caption](path/file.ext).
    For instance, the expression ![Data science for psychologists](./images/logo.png) results in:

Data science for psychologists

  1. Mathematical formulas can be enclosed in $$ and are written in LaTeX-style.
    For instance, the expression $$\bar{x} = \frac{1}{n} \cdot \sum_{i=1}^{n} x_{i}$$ yields:

\[\bar{x} = \frac{1}{n} \cdot \sum_{i=1}^{n} x_{i}\]

  1. External links are entered as [Link text](URL).
    For instance, the expression [ds4psy](https://bookdown.org/hneth/ds4psy/) yields ds4psy.

  2. Footnotes can be included by ^[Footnote text.] and will be numbered automatically.92

  3. Please distinguish between hyphens and various types of longer dashes:

    • a hyphen is typed as - and displayed as “-”;
    • an en-dash is typed as -- and displayed as “–”;
    • an em-dash is typed as --- and displayed as “—”;
    • a minus sign should have the same width as a plus sign. They are typed as $+$/$-$ and displayed as “\(+\)/\(-\).”
  1. Non-breaking spaces prevent line-breaks at positions that are shown as spaces, but should not be broken into separate lines. For instance, names containing spaces (like R Markdown or Dr. Seuss), or references to enumerated objects (like Figure X, and Appendix Z) should never be broken apart when appearing near line ends. To type a non-breaking space in R Markdown, escape the space character (by preceding it with a backslash \, as in Dr.\ Seuss) or use the HTML command &nbsp; (as in Figure&nbsp;X).

  2. Other special characters of R Markdown can also be printed verbatim by escaping them (i.e., preceding them with a backslash \). For instance, enclosing a word in underscores typically emphasizes it by changing the font to italics, but if you wanted to actually show a word enclosed in _underscores_, you can type it as \_underscores\_. Similarly, the symbol # typically marks a new section heading, but can be shown as # by typing \#.

F.3.5 Common errors

Common errors of both novice and experienced R Markdown users include the following:

  • Repeating chunk names: Every named chunk must have a unique name.

  • Erroneous R code or missing objects: If your code contains errors or refers to missing objects it will fail to knit. Not evaluating the corresponding chunk (by setting its option eval = FALSE), can help, as long as this chunk’s results are not needed elsewhere in the code.

  • Including R interface commands: Commands that call the R help system (like ?mtcars or ?c) or show a table in a tab (like View(mtcars)) will typically fail to knit and yield an error. Comment them out in your code prior to knitting (or set eval = FALSE for the corresponding chunk, if its results are not needed elsewhere).

  • Calling local directories or files may occasionally cause problems (e.g., when starting R from a different location). Using RStudio projects and only specifying local (or relative) file paths typically solves these problems (see Section 6.1.2 for details).

An error will typically prevent your .Rmd file from knitting. When an error occurs, try understanding its error message and correcting it in your .Rmd file. If this fails, and setting eval = FALSE is not an option (i.e., the chunk is needed elsewhere), enter the error message into a search machine and hope that others have encountered and solved the same problem.

F.3.6 RStudio nuggets

Once you are comfortable with the basics of writing text and code in the same document and knitting it to create an output file, it may be worthwile to spend a few minutes exploring the interface options provided by the RStudio IDE. For instance, any newly-created code chunk immediately provides three small buttons in its top right corner:

  1. The symbol for the first button looks like a gear wheel and lets you name the cunk, as well as set the most common chunk options. Applying a few of the pre-defined settings and observing their effects on the definition of the chunk (in curly braces {} on the left) is a convenient way to learn more about chunk options.

  2. Clicking on the downward-triangle evaluates all code chunks above the current one. This is useful when current R objects depend on those in previous chunks and may have changed — or you just took a break and are starting a new session at an advanced position in a file.

  3. Clicking on the right-facing triangle lets you run all code in the corresponding chunk. This is useful for evaluating multiple code steps that are build up throughout larger chunks. The same effect can also be achieved by using the Cmd + Shift + Enter keyboard shortcut from anywhere within the chunk.

Using R Markdown in RStudio also takes the notion of foldable sections to a new level. Not only can you open and close code chunks (by clicking on the small triangle to the left of each chunk or entering Cmd + Alt + L and Cmd + Alt + Shift-L), but the concept of folds also extends to the text sections structured by different levels of headings. Thus, getting into a habit of using the Cmd + Alt + (Shift) + O and Cmd + Alt + (Shift) + L keyboard shortcuts will make it much easier to navigate large and complex documents.

Finally, using keyboard shortcuts for your 7±2 most frequent commands is likely to save you hours on this course, let alone the time-saving benefits for the rest of your life. Help on these shortcuts is available in RStudio via Alt + Shift + K or by selecting Help > Keyboard Shortcuts Help.

F.3.7 Mixing markup languages

Another powerful feature of R Markdown is that we can include and mix commands from other markup languages. For instance, I often use HTML to insert comments, images, or special symbols (e.g., &plusmn; to show the ‘±’ symbol in the previous section or &nbsp; to enter the non-breaking spaces that are essential for any serious typesetting effort) or LaTeX commands to enter mathematical symbols or formulas (e.g., \(\sum_{i=1}^{n} x_{i}\)):

logo

<!-- A comment. -->

<!-- An HTML image with a link: -->    
<a href="https://bookdown.org/hneth/ds4psy/">
<img src = "./images/logo.png" alt = "logo" style = "width: 100px; float: right;"/>
</a>

<!-- A LaTeX formula: --> 
$\sum_{i=1}^{n} x_{i}$

  1. These distinctions apply to any type-setting system, not just R Markdown. In most systems, the difference between line breaks and paragraph breaks are somewhat blurred, leading to many typographical errors — not to mention the wide-spread ignorance regarding different dashes and non-breaking spaces.↩︎

  2. This may seem confusing at first, but actually helps structuring arguments while writing them.↩︎

  3. Especially when working on laptops, students initially try to fit as many symbols as possible into a small amount of screen space. To avoid strange errors, get used to inserting at least one empty line between different parts of your .Rmd document (i.e., between headings, text, and chunks).↩︎

  4. This is the footnote.↩︎