Chapter 6 Data Sources and How to Access Them

6.1 Introduction

What is data science without data? Here’s a few indexes and compendiums of data sources, including R packages that conveniently either bundle those data or provide mechanisms to access data from remote sources.

  • Kim, A. Y, Ismay, C., & Chunn, J. (2018). The fivethirtyeight R Package: “Tame Data” Principles for Introductory Statistics and Data Science Courses. Technology Innovations in Statistics Education, 11(1). Retrieved from

6.2 Sources

6.2.1 listings

University of Alberta Libraries, Economics: List of databases

Simon Fraser University Library: Gender, Sexuality & Women’s Studies Information Resources: Facts & Data

6.2.2 open data sources socio-economic

SDG Tracker – “a free, open-access resource where users can track and explore global and country-level progress towards each of the 17 Sustainable Development Goals” established by the United Nations.

  • The database is compiled by Our World in Data (“Research and data to make progress against the world’s largest problems”). Data are available through curated links.

United Nations Population Prospects - detailed country population data

OECD world data, by country

Gapminder - all indicators displayed in Gapminder World science

NASA Goddard Institute for Space Studies—a plethora of data including climate simulations and impacts, Earth observations, and other planets. other

FiveThirtyEight: Our Data – “the data and code behind some of our articles and graphics”.

6.3 R packages

6.3.1 {bcdata}

bcdata – An R package 📦 for searching & retrieving data from the B.C. Data Catalogue.

6.3.3 {cansim}




Dmitry Shkolnik (2018-08-01) The CANSIM package, Canadian tourism, and slopegraphs

6.3.5 {fivethirtyeight}


6.3.6 {gapminder}

gapminder: Data from Gapminder An excerpt of the data available at []. For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

6.3.7 {Lahman}

Lahman: Sean ‘Lahman’ Baseball Database Provides the tables from the ‘Sean Lahman Baseball Database’ as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2018, as recorded in the 2016 version of the database.