Chapter 7 Data acquisition and extraction
Readings:
7.1 Access protocols and permissions
Reproducible extraction of data from source location: may be complicated by access protocols.
- access tokens; APIs
- raw data from github for private repos
- databases
- package
httr
to access data from websites
7.2 Accessing databases
esales <- dbGetQuery(db,'SELECT * from eia_elec_sales_va_all_m') # SQL code to retrieve data from a table in the remote database
# str(esales)
esales <- as_tibble(esales) # Convert dataframe to a 'tibble' for tidyverse work
# str(esales)
7.3 Other comments
Make your extraction code “as reproducible as possible”, subject to these access constraints. At minimum, document clearly how you obtained the data, so others could follow your path, even if not via pure code.
Keep your raw data in read-only mode. Don’t edit these files.
Write code to transform the raw data into form you will use for analysis. Don’t do it manually.