Chapter 7 Data Sources and How to Access Them

7.1 Introduction

What is data science without data? Here’s a few indexes and compendiums of data sources, including R packages that conveniently either bundle those data or provide mechanisms to access data from remote sources.

  • Kim, A. Y, Ismay, C., & Chunn, J. (2018). The fivethirtyeight R Package: “Tame Data” Principles for Introductory Statistics and Data Science Courses. Technology Innovations in Statistics Education, 11(1). Retrieved from https://escholarship.org/uc/item/0rx1231m

7.2 Sources

7.2.1 listings

University of Alberta Libraries, Economics: List of databases

Simon Fraser University Library: Gender, Sexuality & Women’s Studies Information Resources: Facts & Data

7.2.2 open data sources

7.2.2.1 socio-economic

SDG Tracker – “a free, open-access resource where users can track and explore global and country-level progress towards each of the 17 Sustainable Development Goals” established by the United Nations.

  • The database is compiled by Our World in Data (“Research and data to make progress against the world’s largest problems”). Data are available through curated links.

United Nations Population Prospects - detailed country population data

OECD world data, by country

Gapminder - all indicators displayed in Gapminder World

Data sources for Canadian economists, compiled by Stephen Gordon

7.2.2.2 science

NASA Goddard Institute for Space Studies—a plethora of data including climate simulations and impacts, Earth observations, and other planets.

7.2.2.3 other

FiveThirtyEight: Our Data – “the data and code behind some of our articles and graphics”.

7.3 R packages

7.3.1 {bcdata}

bcdata – An R package 📦 for searching & retrieving data from the B.C. Data Catalogue.

7.3.3 {cansim}

package

github

articles

Dmitry Shkolnik (2018-08-01) The CANSIM package, Canadian tourism, and slopegraphs

7.3.5 {fivethirtyeight}

fivethirtyeight

7.3.6 {gapminder}

gapminder: Data from Gapminder An excerpt of the data available at [Gapminder.org]. For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

7.3.7 {Lahman}

Lahman: Sean ‘Lahman’ Baseball Database Provides the tables from the ‘Sean Lahman Baseball Database’ as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2018, as recorded in the 2016 version of the database.

-30-