Chapter 7 Data Sources and How to Access Them
7.1 Introduction
What is data science without data? Here’s a few indexes and compendiums of data sources, including R packages that conveniently either bundle those data or provide mechanisms to access data from remote sources.
- Kim, A. Y, Ismay, C., & Chunn, J. (2018). The fivethirtyeight R Package: “Tame Data” Principles for Introductory Statistics and Data Science Courses. Technology Innovations in Statistics Education, 11(1). Retrieved from https://escholarship.org/uc/item/0rx1231m
7.2 Sources
7.2.1 listings
University of Alberta Libraries, Economics: List of databases
Simon Fraser University Library: Gender, Sexuality & Women’s Studies Information Resources: Facts & Data
7.2.2 open data sources
7.2.2.1 socio-economic
SDG Tracker – “a free, open-access resource where users can track and explore global and country-level progress towards each of the 17 Sustainable Development Goals” established by the United Nations.
- The database is compiled by Our World in Data (“Research and data to make progress against the world’s largest problems”). Data are available through curated links.
United Nations Population Prospects - detailed country population data
- populationpyramid.net uses this data
Gapminder - all indicators displayed in Gapminder World
Data sources for Canadian economists, compiled by Stephen Gordon
Anyone know of a good migration dataset? Ideally # of people moving from country i to country j by year
— Savage Jim (/@/jim_savage_) February 10, 2019
7.2.2.2 science
NASA Goddard Institute for Space Studies—a plethora of data including climate simulations and impacts, Earth observations, and other planets.
7.2.2.3 other
FiveThirtyEight: Our Data – “the data and code behind some of our articles and graphics”.
7.3 R packages
7.3.1 {bcdata}
bcdata – An R package 📦 for searching & retrieving data from the B.C. Data Catalogue.
7.3.3 {cansim}
package
articles
Dmitry Shkolnik (2018-08-01) The CANSIM package, Canadian tourism, and slopegraphs
7.3.4 {CANSIM2R}
CANSIM2R: Directly Extracts Complete CANSIM Data Tables
github: CANSIM2R
- Andrew Clarke (2017-08-09) StatCan API’s Discovered
7.3.6 {gapminder}
gapminder: Data from Gapminder An excerpt of the data available at [Gapminder.org]. For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.
7.3.7 {Lahman}
Lahman: Sean ‘Lahman’ Baseball Database Provides the tables from the ‘Sean Lahman Baseball Database’ as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2018, as recorded in the 2016 version of the database.
-30-