## B.3 Resources

### B.3.1 Data in base R

As R includes a datasets package, every version of R comes with a collection of datasets. To learn which datasets exist and obtain basic information about them, call

library(help = "datasets") 

To obtain information about any particular dataset x, call ?x. Throughout this book, we use quite a few of the datasets in examples and exercises.

# Info on datasets:
?anscombe
?cars
?sleep
?Titanic

# Check dimensions:
dim(ChickWeight)
dim(iris)
dim(sleep)     # Student's Sleep Data
dim(Titanic)   # see also dim(FFTrees::titanic)

As the datasets are included to illustrate particular types of data or problems, they vary widely in size and shape. For instance, the Nile dataset contains a single time series with measurement values of the annual flow of the river Nile from the years 1871 to 1970.

# ?Nile
length(Nile)
#> [1] 100
typeof(Nile)
#> [1] "double"

plot(Nile, col = unikn::Seeblau, lwd = 3)

### B.3.2 Data in R packages

Many R packages contain datasets for demonstration purposes. For instance, this book uses the datasets included in the ds4psy package and the following datasets from various tidyverse packages:

• ggplot2: diamonds, economics, mpg, msleep, etc.
• dplyr: starwars, band_members, band_instruments, nasa, storms, etc.
• tidyr: table1table5, etc.
• stringr: words, sentences, etc.

Other packages with many small and large data sets include:

• babynames: Data on the number of children of each sex given each name (of at least 5 children) by the U.S. social security administration (Wickham, 2019).

• DAAG: Data for Data Analysis and Graphics Using R (Maindonald & Braun, 2003, 2007, 2010).

• dslabs: Datasets used for training for the HarvardX’s Data Science Professional Certificate (Irizarry & Gill, 2019).

• FFTrees: Data for binary classification tasks: breastcancer, car, heartdisease, mushrooms, titanic, wine (Phillips et al., 2017).

• fpp2: Data for Forecasting: Principles and practice (Hyndman & Athanasopoulos, 2018).

• nycflights13: Data for all flights departing from NYC in 2013 (Wickham, 2019).

• ISLR: Data for An Introduction to Statistical Learning with Applications in R (James, Witten, Hastie, & Tibshirani, 2013).

• MASS: Support functions and datasets for Venables and Ripley’s MASS (Ripley et al., 2019).

• psych: Data for the Personality-project.org (Revelle et al., 2018).

• yarrr: Data on pirates, movies, auction, etc. (Phillips, 2018).

This list is incidental and guaranteed to be incomplete. See Rdatasets for a more systematic collection of over 1300 datasets distributed through R and its packages.

### B.3.3 Online sources

The web is full of data, of course, but most of it needs sound data science and a sound dose of scepticism to be of any use. Here are some good starting points for finding free data.

#### Economic datasets

• FRED provides mostly time series data on economic trends

• IPUMS provides census and survey data on various issues from around the world

• Using survey data of the Pew Research Center requires a free account

• UC DATA provides data in the areas of political, social and health sciences.

#### Specific datasets

• PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals