Chapter 2 Week 2
2.1 Introduction to RMarkdown
Welcome to the second lab of dapR 1.
This week, we will introduce R Markdown (Rmd), an extremely versatile and powerful tool for writing reproducible documents of all sorts and formats. We promise you that once you get the hang of it, you will never again want to use your ordinary word processor/text editor for writing esssays, coursework, papers, or even presentations or your CV. In fact, all the teaching materials in this course, from the lecture slides to these lab sheets, have been written in Rmd.
By the end of this lab you will:
- find out about markup languages
- learn basic text formatting using R Markdown
- discover the powerful integration of
R
and text editing that R Markdown offers - learn how to write HTML, Word, and OpenOffice files purely in R Studio
2.2 What is R Markdown?
Well, it is a language - or a system - for telling computers how to process and format text. Unlike the more familiar WYSIWYG word processors/text editors, R Markdown (as well as other markup languages) uses plain text symbols for all formatting. In other words, there is no higlighting text and then clicking on a B icon to make it bold. What you do instead, that is the topic of today’s lab. At this point you might be thinking “why on Earth should I be learning this when I can just use the text editor on my computer?!”. That is a good question and the answer to it is that, when it comes to Rmd, its intergration with R Studio makes it an incredibly useful tool for wiriting documents that include the results of a statistical analysis or data visualisations.
2.3 Getting things ready
Before you do anything else, open R Studio and install the rmarkdown
package if you have not yet done so.
This package will enable you to convert files written in Rmd to output of your choice.
Task 1: Type exactly the following command into the console and press ↵ Enter:
install.packages("rmarkdown")
The best way to learn how R markdown (and programming languages really) works is to compare the source (your code) and the output and notice the relationships between the two. So let’s do that and take a look at what the Rmd version of this lab sheet looks like.
Task 2: Click on the Code ▾ button in the top-right corner of this document and select “Download Rmd”. Save the file in your “Week_02”1 folder.
R markdown files have the .Rmd extension after their names though it is possible that your computer is set up to hide file extensions (a setting we would encourage you to change). This file type should be automatically associated with R Studio on your computer. If your computer offers you a selection of programs to open the file in, just choose R Studio and tick the “always use this app/program to open files of this type” box, if there is one.
Now, unless you have a two-screen setting where you can put windows side-by-side, you will have to frequently switch between your web browser and R Studio. To save time, here is a handy shortcut: Press Alt + ↹ Tab (Windows) or ⌘ Command + ⇥ Tab (Mac OS)2 to switch between the two most recently viewed windows. If you hold down the first of the two keys, you can toggle all currently open windows by pressing Tab multiple times.
Task 3: Open the Week02_Rmd_into.Rmd file (in R Studio) and try out switching between your browser and R Studio using the key shortcut.
2.4 R Markdown basics
Before we delve into the nitty-gritty of Rmd, let’s have a look at how to do some basic formatting you know from your text editor by comparing this section of the the lab sheet and its corresponding source file, note that you want to be working in your Rmd files. Below is what you you wil get as an output.
We can use # to identify heading and subheadings in our document.
2.5 Headings
First, lets look at how you would make headings using #:
# Section 1
## Subsection 1
### Sub subsection 1
2.6 Text
You can also vary the format of your text:
*italics*
returns italics
**bold**
returns bold
~~strikethrough~~
returns strikethrough
superscript^2
returns superscript^2
subscript^~2~
returns subscript^2
2.7 Tables
We can create the tables that summarise useful info.
Operation | R code | Example Input | Example Output |
---|---|---|---|
Square root | sqrt( ) |
sqrt(100) | 10 |
Absolute value | abs( ) |
abs(-100) | 100 |
Round | round(x, digits = ) |
round(12.345, 2) | 12.35 |
Min | min(...) |
min(2.21, 2.22) | 2.21 |
Max | max(...) |
max(2.21, 2.22) | 2.22 |
2.8 Lists
2.8.1 Unordered
- Item
- Sub-item
- Item
- Item
2.8.2 Ordered
- Item 1
- Item 2
- Sub-item 2.1
- Item 3
And most importantly for the Rmd-R
integration we can include chunks of code which will produce the required output when we compile our document all together.
Find a button `insert’ on the top right corner of your Rmd editor. Choose ‘R’.
You will be able to see a chunk where you can put a comment and example of operation (say multiplication). Press the green button on the right in the code chunk.
# This is an R code chunk
# Here you can write code and R will run it when you generate your document
# and display the output below
6 * 7
## [1] 42
Any R
code can also be evaluated in-line like this: 2 + 3 = 5.
Feel free to take a moment to make sure you understand the relationship between the R Markdown notation and the resulting output. For a quick reference guide to Rmd, see this cool cheat sheet.
2.9 Rmd documents
OK, now that you know the very basics, let’s look at the .Rmd file step-by-step.
The first thing to realise is that an .Rmd file is just a plain text file (such as .txt). You could open it in Notepad, MS Word, or OpenOffice3 and would basically see the same thing as in R Studio. The only reason for the special .Rmd extension is for R Studio to know to put all the nice colours in to aid readability and offer you options associated with R Markdown, such as the option to actually generate a document from the file. So don’t go away thinking there’s some magic going on here: There are just text files.
With that out of the way, keep reading on the document in your browser but let’s scroll all the way up in the .Rmd file. There, you can see this header:
---
title: "Introducing R Markdown"
author: "dapR 1 -- Lab 2"
output:
html_notebook:
theme: flatly
code_folding: show
---
For reasons you don’t need to worry about, this header is written in a different markup language called YAML (Yet Another Markup Language – no kiddin’!). Here, you provide the title of the document, the output format, and many other general options.
In our document, we set the title and author and define the output to be an R notebook. R notebook is a HTML4 file just like most websites, which is why we can easily put it online like this. The neat feature of R notebooks is their ability to show/hide and evaluate code chunks and the fact that you can easily download and edit them in R Studio. That is why we will be using them in our dapR labs.
The theme
parameter indented under html_notebook
specifies what the document looks like.
While you can customise the aesthetics of your documents to your heart’s delight, some nice and smart people have provided us with several basic themes that, in our view, look pretty neat.
Finally, the code_folding
parameter governs whether the code chunks should be shown or hidden by default.
While there’s a host of options you can play around with, it is a good idea to always include at least the title and output.
OK, next, there are two code chunks.
The first one gets generated automatically by R Studio when you create a new .Rmd file (more on that later) and is there to set a very basic default “code chunk option” echo=TRUE
.
This option tells R Studio to create the ouput file with the code chunks visible.
Changing it to echo=FALSE
will create a document with code not displayed.
You can specify other default options if you wish but that’s a bit of an advanced topic.
Notice two further things about the chunk:
- It is named (
setup
) – This doesn’t really do anything but it can be helpful when diagnosting code errors and it’s kind of tidy. - There are further chunk options; in this case
include=FALSE
. This particular option makes the code chunk get evaluated but shows neither the code nor its output in the final document. In other words, it executes the code quietly in the background.
Taken together, the last two paragraphs mean that there are two ways of setting code chunk options:
- Globally – Just like the code inside of the first chunk does. Once set like this, the options will apply to all subsequent code chunks.
- Locally – Inside the
{r, ...}
bit at the top of each chunk. These options will apply only to the given code chunk.
There are, again, lots of useful options you can set and, using local options, you can change the behaviour of each individual chunk regardless of what the default—global—setting is. A comprehensive and by no means necessary list can be found in this R markdown reference guide.
The second code chunk illustrates this rather nicely. Despite setting echo
to TRUE
in global options in the first chunk, the second one sets it to FALSE
.
This means that, for this chunk only, the code will get executed and its output displayed but the code chunk itself will not show up in the final document.
However, as it happents, the code in this chunk doesn’t have any output so, in this case echo=FALSE
is indistinguishable from include=FALSE
.
To see the difference, have a look at this chunk:
## Here, output gets included in the document but the code does not!
With respect to the actual contents of the second chunk, don’t worry about it too much. You are not supposed to understand at this stage. If you’re really curious though, the code creates a function that puts the “Task X:” before the actual wording of the tasks so that we don’t have to type it all out and worry about which number this particular task is. We’re lazy like that, you see…
The rest of the .Rmd file should be fairly readable, especially with the benefit of knowing the markdown syntax for text formatting we talked about above. Remember that, by comparing the .Rmd with the lab sheet, you can always figure out how to do things you haven’t explicitly been taught (e.g., writing in superscript or in subscript).
Perhaps the only slightly puzzling looking bits are the links to other websites. It is not immediately important for you to know how to include these links (AKA hyperlinks, or URLs) so feel free to skip the next section.
2.10 Code chunks
Let’s talk a little more about code chunks (and in-line code), since they are the main reason why Rmd is so useful when it comes to reports of statistical analysis. For one, they are great for creating tables and figures. As a basic demonstration, we can create a simple histogram. Again, at this point, you don’t have to worry about understainding the code itselt. The important bit is that, once you know how to create fancy plots and tables, you can create them directly in your .Rmd file to put them in your paper/report/presentation:
library(ggplot2) # load the ggplot2 package
qplot(rnorm(1000), xlab = "Value", ylab = "Frequency") # basic quick histogram
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
That’s pretty cool, isn’t it? What’s arguable even cooler is the fact, that you can incorporate code in the actual body text. Let’s say we have a chunk of code that runs some analysis, for example takes the mean age of our sample.
# create a made up sequence of numbers and pretend they are the ages of our participants
age <- c(34, 22, 26, 25, 43,19, 19, 20, 33, 27, 27, 26, 54)
# calculate their mean, rounded to 2 decimal places
mean_age <- round(mean(age), digits = 2)
With Rmd, we don’t really even have to know what the value of the mean is when writing the results. We can simply use in-line code to have R Studio generate a document that say that the mean age was 28.85.
For the time being, don’t worry about how this is actually done. We will cover that later in depth. For now, simply rejoice in the fact that it can be done ;).
This feature has a very useful consequence: You can write a document in such a way that, if something about your data or analysis changes, you can simply edit the code in the appropriate chunks, re-generate the output file and all the values will get updated. Imagine having to redo a table of 40, 50, 100 numbers – that’s an awfully teatious task and it’s prone to human error. With a proper use of R Markdown you will never have to do it! Imagine how many hours of work that will save you (trust us, it’s a lot). How amazing is that?
2.11 Common Code Chunks Options
name
- This allows you to name your code chunks, but is not necessaryecho
- Whether to display the code chunk or just show the results.echo=FALSE
will embed the code in the document, but the reader won’t be able to see iteval
- Whether to run the code in the code chunk.eval=FALSE
will display the code but not run itwarning
- Whether to display warning messages in the documentmessage
- Whether to display code messages in the documentresults
- Whether and how to display the computation of the results
2.12 Including images or links
2.12.1 Adding links
You can add links to your text quite easily, using square brackets and including the webpage link e.g. [here] (LINK). In practice, just remove the space.
See our book here
2.12.2 Adding figures & pictures
Include picture from online or your workind directory (more on the latter later)
knitr::include_graphics("https://imgs.xkcd.com/comics/correlation.png")
2.13 Generating documents
Now that you have an understanding of the basics of Rmd along with some nifty tricks and can read the source file, let’s talk about how to generate output from the .Rmd’s.
The simplest way of turning the source into output is using the pre-defined shortcuts.
Task 4: Press Ctrl + ⇧ Shift + K (Windows/Linux) or ⌘ Command + ⇧ Shift + K (Mac OS) to turn generate a HTML version of this document.
Hopefully, nothing happend and maybe you spotter R
giving you an error statement of some sort written all in red!
The reason for this is that, before we generate the file, we need to “run” all the code chunks so that R studio has access to their output.
There are several ways of doing this but the easiest is, once again, with a shortcut.
Task 5: Press Ctrl + Alt + R (Windows/Linux) or ⌘ Command + Alt + R (Mac OS) to run all chunks in this .Rmd file.
Task 6: Wait a few seconds for R
to execute your command and then try creating the HTML document again.
The first time you generate a document like this, it can take a while for R
to install and run all the tools necessary to produce your output.
After a moment, the result should pop out in R Studio’s internal viewer.
Take a minute to marvel at your creation!
…
OK, that’s plenty now! Close the viewer window and check your “Week_02” folder. Therein, you should find a file called “Week02_Rmd_intro.nb.html” (the .nb bit indicates it’s an R notebook file). This is your actual output. If you open it, it should appear in your default web browser because HTML files are the stuff websites are made from.
Next, let’s test the editability feature we have so lauded above! Check the value of the mean of the age variable. In the original file, it should be 28.85.
Task 7: Try changing some numbers in the age
variable in the corresponding code chunk, re-run all chunks, and re-generate the file to convince yourself that the mean age will get updated automatically.
Lo and behold, the value is still 28.85… (seriously, change it to something else!)
Now, let’s imagine you don’t want a HTML file but a .doc (Word document).
In order to get that, you need to change the YAML header so that it reads exactly output: word_document
.
Task 8: Generate a Word document from your .Rmd file.
If you don’t have MS Office installed on your computer but are using OpenOffice, change the header to output: odt_document
.
Task 9: For your final task, get your notes from last week’s tutorial and turn them into a nice document written using R Markdown and render it as PDF, R Notebook, or Word (OpenOffice) document.
Well done!
That is all we have in store for you for this lab. We suggest you go over what you learnt today to help your newly acquired knowledge settle.
See you next week!
If you have not created a neat folder structure for this course (and all others too!) yet, now is the time to do it. We suggest you create a “Uni” folder whenever you find convenient (e.g., in Documents but please not on your desktop). This folder will store all your files related to your degree. Within it, create a “Year_1” folder, inside of it a “Sem_1” folder, then “dapR_1”, and inside that “Week_01” and “Week_02” folders.↩
As for Linux, the shortcut depends on your system configuration. Though, if you are using Linux, you probably know how it works.↩
Other text editors are available but now that you know about Rmd, you won’t be needing any of them.↩
HyperText Markup Language – there we go again…↩