This book is written in R Markdown. This means that — behind the scenes — a plethora of R packages are solving complicated problems to eventually create an output document that is both functional and aesthetically pleasing. Fortunately, I do not need to worry about all this, as much more experienced programmers have solved highly specialized tasks and shared the results of their efforts in the form of R packages.60 In fact, by not only letting me combine text with code, but also by separating form from content, using R Markdown lets me focus on content while I am writing this sentence, and worry about formatting later.61
If this sounds complicated, the good news is that you probably have all of these tools on your machine and can also use them for solving simpler data science tasks. Provided that you have a working version of R and R Studio, you can immediately start creating documents in R Markdown.62 And as these documents allow you to interleave text with R code — including evaluating code and showing its output — they may actually make your life easier (as far as submitting data science projects is concerned). As you get more experienced in R and acquire additional skills in scientific writing and research methodology, you can not only impress your instructors but also practice reproducible research to structure your analyses and reports in transparent ways.
F.1.1 Mixing text and code
R Markdown essentially lets us mix text and code.63 But text and code are very different data types: Text is typically rendered as character strings, displayed using type-setting fonts, and interpreted in the reader’s mind. By contrast, code is also typed and written (like text), but is primarily intended to be evaluated by and according to the rules of some programming language. Thus, when using a software tool that allows switching between text and code, we need to signal which type of data we are currently dealing with. In R Markdown, the default mode of writing is text and the key concept for signaling a switch from text to code is a code chunk.
Actually, we have encountered many different code chunks in this book: They are printed as the grey boxes that surround snippets of R code. Whenever we also saw the results of some code evaluation (e.g., the result of some calculation or visualization), the process of mixing text and code succeeded and resulted in a document that merged text, code, and the results from evaluating code.
When first encountering R Markdown, it may be confusing why some things you type do not show up exactly in the same way in your output document — and especially why it makes sense that many things you type later show up differently or not at all. To understand why the distinction between input and output documents make sense, we need to explain the basic idea behind a markup language.
F.1.2 R Markdown as markup
When using R Markdown, any text can simply be typed into an
.Rmd file, but will not be rendered exactly as you type it.
R Markdown is a special instance of a large family of so-called markup languages.
The key feature of any markup language is that the document contains annotations that are syntactically distinguishable from the text.
Different forms of markup languages exist, but in R Markdown, the formatting instructions for text are explicitly added and visible, but not executed immediately. This violates the common WYSIWYG principle (i.e., ‘what you see is what you get’) of software programs like MS Word and may initially seem like a bug, but really turns out to be a feature in the long run. Postponing any formatting until later has 2 main advantages:
Decoupling content from form encourages authors to structure their material conceptually, rather than visually. Authors are forced to focus on the structure of their content, rather than concerned with its looks.
Explicitly distinguishing form from content provides powerful advantages in a world in which multiple output formats exist. Put simply, the same content can be represented in many different forms (e.g., articles vs. books, which may also require different file formats).
Now it’s time to create your first Markdown document and start mixing some text and code chunks.
Key R packages used in creating this book include rmarkdown (Allaire et al., 2020), knitr (Xie, 2020b), and bookdown (Xie, 2020a). The knitr package (Xie, 2015) provides much of the magic in weaving together text and code in the background. However, the actual pipeline of tools in the background is
.Rmd > knitr > .md > pandoc > .html or .pdf output. Fortunately, we do not need to worry about the details of this process, but we should be grateful and provide credit to the people who devoted years of their lives to make all this possible.↩
The distinction between content and form is particularly helpful if you are dealing with an opaque and potentially changing set of formatting rules (e.g., APA-style). I am a keen advocate of LaTeX for the same reason, but as R Markdown is simpler and more general (e.g., by including many LaTeX tools), I find myself using more Markdown these days.↩
Otherwise, you may have to install some additional packages (like rmarkdown and knitr). Creating output documents in PDF format may require an additional LaTeX installation, but nowadays HTML gets you pretty far.↩
For those familiar with Microsoft Office™ and statistics software like SPSS™ or SAS™: R Markdown let’s you combine the functionality of Word, Excel, and your statistics software in a single framework.↩