## B.3 Resources

### B.3.1 Data in base R

As R includes a datasets package, every version of R comes with a collection of datasets. To learn which datasets exist and obtain basic information about them, call

library(help = "datasets") 

To obtain information about any particular dataset x, call ?x. Throughout this book, we use quite a few of the datasets in examples and exercises.

# Info on datasets:
?anscombe
?cars
?sleep
?Titanic

# Check dimensions:
dim(ChickWeight)
dim(iris)
dim(sleep)     # Student's Sleep Data
dim(Titanic)   # see also dim(FFTrees::titanic)

As the datasets are included to illustrate particular types of data or problems, they vary widely in size and shape. For instance, the Nile dataset contains a single time series with measurement values of the annual flow of the river Nile from the years 1871 to 1970.

# ?Nile
length(Nile)
#> [1] 100
typeof(Nile)
#> [1] "double"

plot(Nile, col = unikn::Seeblau, lwd = 3)

### B.3.2 Data in R packages

Many R packages also include datasets. For instance, we use the following datasets from various tidyverse packages:

• dplyr: starwars, band_members, band_instruments, nasa, storms, etc.
• ggplot2: diamonds, economics, mpg, msleep, etc.
• tidyr: table1, etc.

Other packages with many small and large data sets include:

• babynames
• DAAG: Data for Data Analysis and Graphics Using R (Maindonald & Braun, 2003, 2007, 2010)
• dslabs
• eurostat
• FFTrees: Data for binary classification tasks: breastcancer, car, heartdisease, mushrooms, titanic, wine
• fpp2: Data for Forecasting: Principles and practice (Hyndman & Athanasopoulos, 2018)
• nycflights13
• ISLR: Data for An Introduction to Statistical Learning with Applications in R (James, Witten, Hastie, & Tibshirani, 2013)
• MASS
• psych: Data for the Personality-project.org (Revelle et al., 2018)
• yarrr: pirates, movies, auction, etc. (Phillips, 2018)

This list is incidental and woefully incomplete. See Rdatasets for a more systematic collection of over 1300 datasets distributed through R and its packages.

### B.3.3 Online sources

The web is full of data, of course, but most of it needs sound data science and a sound dose of scepticism to be of any use. Here are some good starting points for finding free data.

#### Economic datasets

• FRED provides mostly time series data on economic trends

• IPUMS provides census and survey data on various issues from around the world

• Using survey data of the Pew Research Center requires a free account

• UC DATA provides data in the areas of political, social and health sciences.

#### Specific datasets

• PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals