This book is written in R Markdown. This means that — behind the scenes — a plethora of R packages are solving tricky formatting and positioning problems to eventually create a document that is both functional and aesthetically pleasing. Fortunately, I do not need to worry about all this, as much more experienced programmers have solved highly specialized tasks and shared the results of their efforts in the form of R packages.72 In fact, by not only letting me combine text with code, but also by separating form from content, using R Markdown lets me focus on content while I am writing this sentence, and worry about formatting later.73
If this sounds complicated, the good news is that you already have all of these tools on your machine, provided that you have installed a working version of R and RStudio.74 Thus, the tools you are using to analyze data also allow you to write and publish beautiful documents. When you compare this to other software programs, this flexibility is pretty amazing.
Using R Markdown allows you to interleave text with R code — including evaluating code and showing its output. This requires learning a few distinctions and formatting commands, but will soon make your life a lot easier (as far as composing and submitting data science reports is concerned). As you get more experienced in R and acquire additional skills in scientific writing and research methodology, you will not only impress your instructors but also embrace reproducible research practices to document your analyses and structure your reports in transparent ways.
F.1.1 Mixing text and code
R Markdown essentially lets us mix text and code.75 But text and code are very different data types: Text is typically rendered as character strings, displayed using type-setting fonts, and interpreted in the reader’s mind. By contrast, code is also typed and written (like text), but is primarily intended to be evaluated by and according to the rules of some programming language. Thus, when using a software tool that allows switching between text and code, we need to signal which type of data we are currently dealing with. In R Markdown, the default mode of writing is text and the key concept for signaling a switch from text to code is a code chunk.
Actually, we have encountered a large number of code chunks in this book: They are printed as the grey boxes that surround snippets of R code. Whenever we also saw some outcome of evaluating code (e.g., the results of a calculation, or a visualization), the process of mixing text and code succeeded and resulted in a document that merged text, code, and the results from evaluating code.
When first encountering R Markdown, it may be confusing why some things you type do not show up exactly in the same way in your output document — and especially why it makes sense that many things you type later show up differently or not at all. To understand why the distinction between input and output documents make sense, we need to explain the basic idea behind a markup language.
F.1.2 R Markdown as markup
When using R Markdown, any text can simply be typed into an
.Rmd file, but will not be rendered exactly as you type it.
R Markdown is a special instance of a large family of so-called markup languages (see Wikipedia for definition and examples).
The key feature of any markup language is that the document contains annotations that are syntactically distinguishable from the text.
Different forms of markup languages exist, but in R Markdown, the formatting instructions for text are explicitly added and visible, but not executed immediately. This violates the common WYSIWYG principle (i.e., ‘what you see is what you get’) of software programs like MS Word and may initially seem like a bug, but really turns out to be a feature in the long run. Postponing any formatting until later has two main advantages:
Focus on content, rather than form: Decoupling content from form encourages authors to structure their material conceptually, rather than visually. Authors are forced to focus on the content and explicate the structure of their arguments, rather than being concerned with appearances (e.g., formatting and looks).
Flexibility: Explicitly distinguishing form from content provides powerful advantages in a world in which multiple output formats exist. Put simply, the same content can be represented in many different forms (e.g., articles vs. books, which may also require different file formats).
Now it’s time to create our first Markdown document and start mixing some text and code chunks.
Key R packages used in creating this book include rmarkdown (Allaire et al., 2020), knitr (Xie, 2020b), and bookdown (Xie, 2020a). The knitr package (Xie, 2015) provides much of the magic in weaving together text and code in the background. However, the actual pipeline of tools in the background is
.Rmd > knitr > .md > pandoc > .html or .pdfand uses many different programs and technologies. Fortunately, we do not need to worry about the details of this process — but we should be grateful and provide credit to the people who devoted years of their lives to solving these tasks.↩
The distinction between content and form is particularly helpful if you are dealing with an opaque and potentially changing set of formatting rules (e.g., APA style). I am a keen advocate of LaTeX for the same reason, but as R Markdown is simpler and more general (e.g., by including many LaTeX tools), I find myself using more Markdown these days.↩
Otherwise, you may have to install some additional R packages (like knitr and rmarkdown). Creating output documents in PDF format may require an additional LaTeX installation, but nowadays HTML gets you pretty far.↩
For those familiar with Microsoft Office™ and statistics software like SPSS™ or SAS™: R Markdown let’s you combine the functionality of Word, Excel, and your statistics software in a single framework.↩