B.3.1 Data in base R
As R includes a datasets package, every version of R comes with a collection of datasets. To learn which datasets exist and obtain basic information about them, call
To obtain information about any particular dataset
Throughout this book, we use quite a few of the datasets in examples and exercises.
As the datasets are included to illustrate particular types of data or problems, they vary widely in size and shape.
For instance, the
Nile dataset contains a single time series with measurement values of the annual flow of the river Nile from the years 1871 to 1970.
B.3.2 Data in R packages
Many R packages also include datasets. For instance, we use the following datasets from various tidyverse packages:
Other packages with many small and large data sets include:
- DAAG: Data for Data Analysis and Graphics Using R (Maindonald & Braun, 2003, 2007, 2010)
- FFTrees: Data for binary classification tasks:
- fpp2: Data for Forecasting: Principles and practice (Hyndman & Athanasopoulos, 2018)
- ISLR: Data for An Introduction to Statistical Learning with Applications in R (James, Witten, Hastie, & Tibshirani, 2013)
- psych: Data for the Personality-project.org (Revelle et al., 2018)
auction, etc. (Phillips, 2018)
This list is incidental and woefully incomplete. See Rdatasets for a more systematic collection of over 1300 datasets distributed through R and its packages.
B.3.3 Online sources
The web is full of data, of course, but most of it needs sound data science and a sound dose of scepticism to be of any use. Here are some good starting points for finding free data.
Collections of datasets
- PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals