Chapter 6 Data Sources and How to Access Them
What is data science without data? Here’s a few indexes and compendiums of data sources, including R packages that conveniently either bundle those data or provide mechanisms to access data from remote sources.
- Kim, A. Y, Ismay, C., & Chunn, J. (2018). The fivethirtyeight R Package: “Tame Data” Principles for Introductory Statistics and Data Science Courses. Technology Innovations in Statistics Education, 11(1). Retrieved from https://escholarship.org/uc/item/0rx1231m
University of Alberta Libraries, Economics: List of databases
Simon Fraser University Library: Gender, Sexuality & Women’s Studies Information Resources: Facts & Data
6.2.2 open data sources
SDG Tracker – “a free, open-access resource where users can track and explore global and country-level progress towards each of the 17 Sustainable Development Goals” established by the United Nations.
- The database is compiled by Our World in Data (“Research and data to make progress against the world’s largest problems”). Data are available through curated links.
United Nations Population Prospects - detailed country population data
- populationpyramid.net uses this data
Gapminder - all indicators displayed in Gapminder World
Anyone know of a good migration dataset? Ideally # of people moving from country i to country j by year— Savage Jim (/@/jim_savage_) February 10, 2019
NASA Goddard Institute for Space Studies—a plethora of data including climate simulations and impacts, Earth observations, and other planets.
6.3 R packages
bcdata – An R package 📦 for searching & retrieving data from the B.C. Data Catalogue.
Dmitry Shkolnik (2018-08-01) The CANSIM package, Canadian tourism, and slopegraphs
- Andrew Clarke (2017-08-09) StatCan API’s Discovered
gapminder: Data from Gapminder An excerpt of the data available at [Gapminder.org]. For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.
Lahman: Sean ‘Lahman’ Baseball Database Provides the tables from the ‘Sean Lahman Baseball Database’ as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2018, as recorded in the 2016 version of the database.