10 R Documentation

This chapter discusses how to navigate R documentations to help us better understand the functions and syntax of R.

10.1 Add-on packages

In R, add-on packages are the packages that are not included in the default installation of R ¹⁶, but can be installed separately to extend the functionality of the language. These packages are typically created by third-party developers, and are hosted on the Comprehensive R Archive Network (CRAN) or other repositories.

For instance, tidyquant is a tidyverse package that helps us collect and analyze financial data in R. You’ll also find people using the R package quantmod for the same array of tasks performed by tidyquant. Neither tidyquant nor quantmod comes with the base R. They are add-on packages that R users have developed to accomplish very specific tasks.

But how are the two packages tidyquant and quantmod different from each other? Which one should we use? How would we know if it is a “good” package or not? What is tidyverse? How to navigate the seemingly daunting documentation of a selected package? Below are some tips for utilizing R documentation to find and use add-on packages.

10.2 Navigating R documentation

On a package’s CRAN landing page, we’ll find several labels that can be quite useful. Under the section Documentation, we’ll find Reference manual and Vignettes. A reference manual is a definitive guide where each function of a package is documented, written by the developers.

A vignette is framed around a target problem that the package is designed to solve. For instance, the package rtweet provides three vignettes: Authentication with rtweet, Intro to rtweet, and Live streaming tweets, each dealing with a specific task.

Under the section Downloads, we’ll find Package source and Old sources. Old sources is an archive where we can download packages of older versions.

10.3 Finding and using add-on packages

1. Which package should I use?

Make sure the package is being actively maintained. One indicator is the last updated date. We can find this date on the first page of its reference manual, or check its GitHub repository.
Keep an eye on what your community members use. You may also find out how popular a package is, in general, on the website RDocumentation.

For instance, there are more than three R packages that can be used to collect data from Twitter, including rtweet, twitteR and streamR. The last updated dates for the three packages are in 2024, 2022 and 2018, respectively. Only the first one is being maintained.

Whether a package is actively maintained matters a lot, especially considering the fact that web services such as Twitter has been actively updating its policies for data collection via its APIs. Therefore, the R wrappers for Twitter APIs better keep up with Twitter’s pace.

2. Which function should I use?

Make sure you read the documentation of the function, pay attention to its technical details, and understand what it does, at least intuitively.

For instance, there are several packages that specialize in text analysis. For instance, a technique in text analysis, polarity score, is available in several packages. Then, which package and which function should we rely on? Before we can make a decision, we need to think about the algorithms and dictionaries that the polarity score functions in these two packages use. What algorithms did they use? What dictionaries are they built upon? Can they solve the problem at hand?

For instance, for quanteda, functions that generate the polarity scores are documented in its help files.

3. Package specific object classes

When calling functions from an add-on package, we often get returned objects specific to that package. For instance, quantmod return xts or zoo objects when retrieving Yahoo Finance data, but tidyquant return data in “tidy” forms, such as tbl_df and tbl. However, underneath these peculiar names, ultimately these objects are R data structures. We can access and manipulate these objects with the methods that we have learnt before in the section “data structures”.

4. What is tidyverse?

If you use R, you probably will bump into tidyverse. Even if you don’t use it, or dislike it for some reasons, you may have collaborators who use tidyverse. Therefore, it is important to at least get familiar with it.

If you know Marvel universe, you can probably see what tidyverse is. tidyverse is a collection of R packages that share an underlying design philosophy, grammar, and data structures.

tidyverse packages operate on tidy data. Tidy data has a specific structure: each variable is a column; each observation is a row; and each type of observational unit is a table.

Alternatively, you can think of tidyverse as a dialect of R, and it is certainly not the only dialect in R.

For a complete list of base functions, use library(help = "base").↩︎