10 R Documentation
This chapter discusses how to navigate R documentations to help us better understand the functions and syntax of R.
10.1 Add-on packages
In R, add-on packages are the packages that are not included in the default installation of R 16, but can be installed separately to extend the functionality of the language. These packages are typically created by third-party developers, and are hosted on the Comprehensive R Archive Network (CRAN) or other repositories.
For instance, tidyquant
is a tidyverse
package that helps us collect and analyze financial data in R. You’ll also find people using the R package quantmod
for the same array of tasks performed by tidyquant
. Neither tidyquant
nor quantmod
comes with the base R. They are add-on packages that R users have developed to accomplish very specific tasks.
But how are the two packages tidyquant
and quantmod
different from each other? Which one should we use? How would we know if it is a “good” package or not? What is tidyverse
? How to navigate the seemingly daunting documentation of a selected package? Below are some tips for utilizing R documentation to find and use add-on packages.
10.3 Finding and using add-on packages
1. Which package should I use?
Make sure the package is being actively maintained. One indicator is the last updated date. We can find this date on the first page of its reference manual, or check its GitHub repository.
Keep an eye on what your community members use. You may also find out how popular a package is, in general, on the website RDocumentation.
For instance, there are more than three R packages that can be used to collect data from Twitter, including rtweet
, twitteR
and streamR
. The last updated dates for the three packages are in 2024, 2022 and 2018, respectively. Only the first one is being maintained.
Whether a package is actively maintained matters a lot, especially considering the fact that web services such as Twitter has been actively updating its policies for data collection via its APIs. Therefore, the R wrappers for Twitter APIs better keep up with Twitter’s pace.
2. Which function should I use?
Make sure you read the documentation of the function, pay attention to its technical details, and understand what it does, at least intuitively.
For instance, there are several packages that specialize in text analysis. For instance, a technique in text analysis, polarity score, is available in several packages. Then, which package and which function should we rely on? Before we can make a decision, we need to think about the algorithms and dictionaries that the polarity score functions in these two packages use. What algorithms did they use? What dictionaries are they built upon? Can they solve the problem at hand?
For instance, for quanteda
, functions that generate the polarity scores are documented in its help files.
3. Package specific object classes
When calling functions from an add-on package, we often get returned objects specific to that package. For instance, quantmod
return xts
or zoo
objects when retrieving Yahoo Finance data, but tidyquant
return data in “tidy” forms, such as tbl_df
and tbl
. However, underneath these peculiar names, ultimately these objects are R data structures. We can access and manipulate these objects with the methods that we have learnt before in the section “data structures”.
4. What is tidyverse
?
If you use R, you probably will bump into tidyverse
. Even if you don’t use it, or dislike it for some reasons, you may have collaborators who use tidyverse
. Therefore, it is important to at least get familiar with it.
If you know Marvel universe, you can probably see what tidyverse
is. tidyverse
is a collection of R packages that share an underlying design philosophy, grammar, and data structures.
tidyverse
packages operate on tidy data. Tidy data has a specific structure: each variable is a column; each observation is a row; and each type of observational unit is a table.
Alternatively, you can think of tidyverse
as a dialect of R, and it is certainly not the only dialect in R.
For a complete list of base functions, use
library(help = "base")
.↩︎