3.4 Documentation

The objectives of this section are:

  • Create R function documentation using roxygen2
  • Create vignettes using knitr and R Markdown

There are two main types of documentation you may want to include with packages:

  • Longer documents that give tutorials or overviews for the whole package
  • Shorter, function-specific help files for each function or group of related functions

You can create the first type of document using package vignettes, README files, or both. For the function-specific help files, the easiest way to create these is with the roxygen2 package.

In this section, we’ll cover why and how to create this documentation. In addition, vignette / README documentation can be done using knitr to create R Markdown documents that mix R code and text, so we’ll include more details on that process.

3.4.1 Vignettes and README files

You will likely want to create a document that walks users through the basics of how to use your package. You can do this through two formats:

  • Vignette: This document is bundled with your R package, so it becomes locally available to a user once they install your package from CRAN. They will also have it available if they install the package from GitHub, as long as they use the build_vignettes = TRUE option when running install_github.
  • README file: If you have your package on GitHub, this document will show up on the main page of the repository.

A package likely only needs a README file if you are posting the package to GitHub. For any GitHub repository, if there is a README.md file in the top directory of the repository, it will be rendered on the main GitHub repository page below the listed repository content. For an example, visit https://github.com/geanders/countytimezones and scroll down. You’ll see a list of all the files and subdirectories included in the package repository and below that is the content in the package’s README.md file, which gives a tutorial on using the package.

If the README file does not need to include R code, you can write it directly as an .md file, using Markdown syntax, which is explained in more detail in the next section. If you want to include R code, you should start with a README.Rmd file, which you can then render to Markdown using knitr. You can use the devtools package to add either a README.md or README.Rmd file to a package directory using use_readme_md or use_readme_rmd, respectively. These functions will add the appropriate file to the top level of the package directory and will also add the file name to “.Rbuildignore”, since having one of these files in the top level of the package directory could otherwise cause some problems when building the package.

The README file is a useful way to give GitHub users information about your package, but it will not be included in builds of the package or be available through CRAN for packages that are posted there. Instead, if you want to create tutorials or overview documents that are included in a package build, you should do that by adding one or more package vignettes. Vignettes are stored in a vignettes subdirectory within the package directory.

To add a vignette file, saved within this subdirectory (which will be created if you do not already have it), use the use_vignette function from devtools. This function takes as arguments the file name of the vignette you’d like to create and the package for which you’d like to create it (the default is the package in the current working directory). For example, if you are currently working in your package’s top-level directory and you would like to add a vignette called “model_details”, you can do that with the code:

use_vignette("model_details")

You can have more than one vignette per package, which can be useful if you want to include one vignette that gives a more general overview of the package as well as a few vignettes that go into greater detail about particular aspects or applications.

Once you create a vignette with use_vignette, be sure to update the Vignette Index Entry in the vignette’s YAML (the code at the top of an R Markdown document). Replace “Vignette Title” there with the actual title you use for the vignette.

3.4.2 Knitr / Markdown

Both vignettes and README files can be written as R Markdown files, which will allow you to include R code examples and results from your package. One of the most exciting tools in R is the knitr system for combining code and text to create a reproducible document. In terms of the power you get for time invested in learning a tool, knitr probably can’t be beat. Everything you need to know to create and “knit” a reproducible document can be learned in about 20 minutes, and while there is a lot more you can do to customize this process if you want to, probably 80% of what you’ll ever want to do with knitr you’ll learn in those first 20 minutes.

R Markdown files are mostly written using Markdown. To write R Markdown files, you need to understand what markup languages like Markdown are and how they work. In Word and other word processing programs you have used, you can add formatting using buttons and keyboard shortcuts (e.g., “Ctrl-B” for bold). The file saves the words you type. It also saves the formatting, but you see the final output, rather than the formatting markup, when you edit the file (WYSIWYG – what you see is what you get). In markup languages, on the other hand, you markup the document directly to show what formatting the final version should have (e.g., you type **bold** in the file to end up with a document with bold). Examples of markup languages include:

  • HTML (HyperText Markup Language)
  • LaTex
  • Markdown (a “lightweight” markup language)

3.4.3 Common Markdown formatting elements

To write a file in Markdown, you’ll need to learn the conventions for creating formatting. This table shows what you would need to write in a flat file for some common formatting choices:

Code Rendering Explanation
**text** text boldface
*text* text italicized
[text](www.google.com) text hyperlink
# text first-level header
## text second-level header

Some other simple things you can do in Markdown include:

  • Lists (ordered or bulleted)
  • Equations
  • Tables
  • Figures from files
  • Block quotes
  • Superscripts

The start of a Markdown file gives some metadata for the file (authors, title, format) in a language called YAML. For example, the YAML section of a package vignette might look like this:

---
title: "Model Details for example_package"
author: "Jane Doe"
date: "2017-09-21"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model Details for example_package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

When creating R Markdown documents using the RStudio toolbar, much of this YAML will be automatically generated based on your specifications when opening the initial file. However, this is not the case with package vignettes, for which you’ll need to go into the YAML and add the authors and title yourself. Leave the vignette engine, vignette encoding, output, and date as their default values.

For more Markdown conventions, see RStudio’s R Markdown Reference Guide (link also available through “Help” in RStudio).

R Markdown files work a lot like Markdown files, but add the ability to include R code that will be run before rendering the final document. This functionality is based on literate programming, an idea developed by Donald Knuth, to mix executable code with regular text. The files you create can then be “knitted”, to run any embedded code. The final output will have results from your code and the regular text.

The basic steps of opening and rendering an R Markdown file in RStudio are:

  • To open a new R Markdown file, go to “File” -> “New File” -> “RMarkdown…”. To start, choose a “Document” in “HTML” format.
  • This will open a new R Markdown file in RStudio. The file extension for R Markdown files is “.Rmd”.
  • The new file comes with some example code and text. You can run the file as-is to try out the example. You will ultimately delete this example code and text and replace it with your own.
  • Once you “knit” the R Markdown file, R will render an HTML file with the output. This is automatically saved in the same directory where you saved your .Rmd file.
  • Write everything besides R code using Markdown syntax.

The knit function from the knitr package works by taking a document in R Markdown format (among a few possible formats), reading through it for any markers of the start of R code, running any of the code between that “start” marker and a marker showing a return to regular Markdown, writing any of the relevant results from R code into the Markdown file in Markdown format, and then passing the entire document to software that can render from Markdown to the desired output format (for example, compile a pdf, Word, or HTML document).

This means that all a user needs to do to include R code within a document is to properly separate it from other parts of the document through the appropriate markers. To indicate R code in an RMarkdown document, you need to separate off the code chunk using the following syntax:

```{r}
my_vec <- 1:10
```

This syntax tells R how to find the start and end of pieces of R code (code chunks) when the file is rendered. R will walk through, find each piece of R code, run it and create output (printed output or figures, for example), and then pass the file along to another program to complete rendering (e.g., Tex for pdf files).

You can specify a name for each chunk, if you’d like, by including it after “r” when you begin your chunk. For example, to give the name load_mtcars to a code chunk that loads the mtcars dataset, specify that name in the start of the code chunk:

```{r load_mtcars}
data(mtcars)
```

Here are a couple of tips for naming code chunks:

  • Chunk names must be unique across a document.
  • Any chunks you don’t name are given ordered numbers by knitr.

You do not have to name each chunk. However, there are some advantages:

  • It will be easier to find any errors.
  • You can use the chunk labels in referencing for figure labels.
  • You can reference chunks later by name.

3.4.4 Common knitr chunk options

You can also add options when you start a chunk. Many of these options can be set as TRUE / FALSE and include:

Option Action
echo Print out the R code?
eval Run the R code?
messages Print out messages?
warnings Print out warnings?
include If FALSE, run code, but don’t print code or results

Other chunk options take values other than TRUE / FALSE. Some you might want to include are:

Option Action
results How to print results (e.g., hide runs the code, but doesn’t print the results)
fig.width Width to print your figure, in inches (e.g., fig.width = 4)
fig.height Height to print your figure

To include any of these options, add the option and value in the opening brackets and separate multiple options with commas:

```{r  messages = FALSE, echo = FALSE}
mtcars[1, 1:3]
```

You can set “global” options at the beginning of the document. This will create new defaults for all of the chunks in the document. For example, if you want echo, warning, and message to be FALSE by default in all code chunks, you can run:

```{r  global_options}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,
  warning = FALSE)
```

If you set both global and local chunk options that you set specifically for a chunk will take precedence over global options. For example, running a document with:

```{r  global_options}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,
  warning = FALSE)
```


```{r  check_mtcars, echo = TRUE}
head(mtcars, 1)
```

would print the code for the check_mtcars chunk, because the option specified for that specific chunk (echo = TRUE) would override the global option (echo = FALSE).

You can also include R output directly in your text (“inline”) using backticks:

“There are `r nrow(mtcars)` observations in the mtcars data set. The average miles per gallon is `r mean(mtcars$mpg, na.rm = TRUE)`.”

Once the file is rendered, this gives:

“There are 32 observations in the mtcars data set. The average miles per gallon is 20.090625.”

Here are some tips that will help you diagnose some problems rendering R Markdown files:

  • Be sure to save your R Markdown file before you run it.
  • All the code in the file will run “from scratch”— as if you just opened a new R session.
  • The code will run using, as a working directory, the directory where you saved the R Markdown file.
  • To use the latest version of functions in a package you are developing in an R Markdown document, rebuild the package before knitting the document. You can build a package using the “Build” tab in one of the RStudio panes.

You’ll want to try out pieces of your code as you write an R Markdown document. There are a few ways you can do that:

  • You can run code in chunks just like you can run code from a script (Ctrl-Return or the “Run” button).
  • You can run all the code in a chunk (or all the code in all chunks) using the different options under the “Run” button in RStudio.
  • All the “Run” options have keyboard shortcuts, so you can use those.

Two excellent books for learning more about creating reproducible documents with R are Dynamic Documents with R and knitr by Yihui Xie (the creator of knitr) and Reproducible Research with R and RStudio by Christopher Gandrud. The first goes into the technical details of how knitr and related code works, which gives you the tools to extensively customize a document. The second provides an extensive view of how to use tools from R and other open source software to conduct, write up, and present research in a reproducible and efficient way. RStudio’s R Markdown Cheatsheet is another very useful reference.

3.4.5 Help files and roxygen2

In addition to writing tutorials that give an overview of your whole package, you should also write specific documentation showing users how to use and interpret any functions you expect users to directly call.

These help files will ultimately go in a folder called /man of your package, in an R documentation format (.Rd file extensions) that is fairly similar to LaTeX. You used to have to write all of these files as separate files. However, the roxygen2 package lets you put all of the help information directly in the code where you define each function. Further, roxygen2 documentation allows you to include tags (@export, @importFrom) that will automate writing the package NAMESPACE file, so you don’t need to edit that file by hand.

With roxygen2, you add the help file information directly above the code where you define each functions, in the R scripts saved in the R subdirectory of the package directory. You start each line of the roxygen2 documentation with #' (the second character is an apostrophe, not a backtick). The first line of the documentation should give a short title for the function, and the next block of documentation should be a longer description. After that, you will use tags that start with @ to define each element you’re including. You should leave an empty line between each section of documentation, and you can use indentation for second and later lines of elements to make the code easier to read.

Here is a basic example of how this roxygen2 documentation would look for a simple “Hello world” function:

#' Print "Hello world" 
#'
#' This is a simple function that, by default, prints "Hello world". You can 
#' customize the text to print (using the \code{to_print} argument) and add
#' an exclamation point (\code{excited = TRUE}).
#'
#' @param to_print A character string giving the text the function will print
#' @param excited Logical value specifying whether to include an exclamation
#'    point after the text
#' 
#' @return This function returns a phrase to print, with or without an 
#'    exclamation point added. As a side effect, this function also prints out
#'    the phrase. 
#'
#' @examples
#' hello_world()
#' hello_world(excited = TRUE)
#' hello_world(to_print = "Hi world")
#'
#' @export
hello_world <- function(to_print = "Hello world", excited = FALSE){
    if(excited) to_print <- paste0(to_print, "!")
    print(to_print)
}

You can run the document function from the devtools package at any time to render the latest version of these roxygen2 comments for each of your functions. This will create function-specific help files in the package’s “man” subdirectory as well as update the package’s NAMESPACE file.

3.4.6 Common roxygen2 tags

Here are some of the common roxygen2 tags to use in creating this documentation:

Tag Meaning
@return A description of the object returned by the function
@parameter Explanation of a function parameter
@inheritParams Name of a function from which to get parameter definitions
@examples Example code showing how to use the function
@details Add more details on how the function works (for example, specifics of the algorithm being used)
@note Add notes on the function or its use
@source Add any details on the source of the code or ideas for the function
@references Add any references relevant to the function
@importFrom Import a function from another package to use in this function (this is especially useful for inline functions like %>% and %within%)
@export Export the function, so users will have direct access to it when they load the package

Here are a few things to keep in mind when writing help files using roxygen2:

  • The tags @example and @examples do different things. You should always use the @examples (plural) tag for example code, or you will get errors when you build the documentation.
  • The @inheritParams function can save you a lot of time, because if you are using the same parameters in multiple functions in your package, you can write and edit those parameter descriptions just in one place. However, keep in mind that you must point @inheritParams to the function where you originally define the parameters using @param, not another function where you use the parameters but define them using an @inheritParams pointer.
  • If you want users to be able to directly use the function, you must include @export in your roxygen2 documentation. If you have written a function but then find it isn’t being found when you try to compile a README file or vignette, a common culprit is that you have forgotten to export the function.

3.4.7 Common roxygen2 formatting tags

You can include formatting (lists, etc.) and equations in the roxygen2 documentation. Here are some of the common formatting tags you might want to use:

Tag Meaning
\code{} Format in a typeface to look like code
\dontrun{} Use with examples, to avoid running the example code during package builds and testing
\link{} Link to another R function
\eqn{}{} Include an inline equation
\deqn{}{} Include a display equation (i.e., shown on its own line)
\itemize{} Create an itemized list
\url{} Include a web link
\href{}{} Include a web link

Some tips on using the R documentation format:

  • Usually, you’ll want you use the \link tag only in combination with the \code tag, since you’re linking to another R function. Make sure you use these with \code wrapping \link, not the other way around (\code{\link{other_function}}), or you’ll get an error.
  • Some of the equation formatting, including superscripts and subscripts, won’t parse in Markdown-based documentation (but will for pdf-based documentation). With the \eqn and deqn tags, you can include two versions of an equation, one with full formatting, which will be fully compiled by pdf-based documentation, and one with a reduced form that looks better in Markdown-based documentation (for example, \deqn{ \frac{X^2}{Y} }{ X2 / Y }).
  • For any examples in help files that take a while to run, you’ll want to wrap the example code in the \dontrun tag.
  • The tags \url and \href both include a web link. The difference between the two is that \url will print out the web address in the help documentation, href allows you to use text other than the web address for the anchor text of the link. For example: "For more information, see \url{www.google.com}."; "For more information, \href{www.google.com}{Google it}.".

In addition to document functions, you should also document any data that comes with your package. To do that, create a file in the /R folder of the package called “data.R” to use to documentation all of the package’s datasets. You can use roxygen2 to document each dataset, and end each with the name of the dataset in quotation marks. There are more details on documenting package data using roxygen2 in the next section.

As you prepare a package for sharing with others, you may want to create a pdf manual, which provides a more user-friendly format for proofreading all the package help files. You can create one with the R CMD Rd2pdf shell command. To use this, open a shell and navigate to the parent directory of your R package directory (an easy way to do this is to open a shell using the “Shell” option for the gear button in the Git pane in RStudio and then running cd .. to move up one directory). Then, from the shell, run R CMD Rd2pdf followed by your package’s name (e.g., for a package named “examplepackage”, run R CMD Rd2pdf examplepackage). This command builds your package and creates and opens a pdf with the text of all help files for exported functions. Check out this StackOverflow thread for more.

3.4.8 Summary

You should include documentation to help others use your package, both longer-form documentation through vignettes or README files and function-specific help files. Longer-form documentation can be written using R Markdown files, which can include executable R code examples, while function-specific help files can be written using roxygen2 comments within the R scripts where each function is defined.