Chapter 6 Data extraction

6.1 Access and permissions

  • Reproducible extraction of data from source location: may be complicated by access protocols.

    • access tokens; APIs
    • raw data from github for private repos
    • databases
    • httr

Make your extraction code “as reproducible as possible”, subject to these access constraints. At minimum, document clearly how you obtained the data, so others could follow your path, even if not via pure code.

Reminder: Keep your raw data in read-only mode. Don’t edit these files. Write code to transform the raw data into form you will use for analysis.

---- Forwarded Message -----
From: GitHub <noreply@github.com>
To: Arthur Small <asmall@virginia.edu>
Sent: Sunday, February 21, 2021, 6:20:58 AM EST
Subject: [GitHub] Deprecation Notice

Hi @arthursmalliii,

You recently used a password to access the repository at uva-eng-time-series-sp21/coronato-nicholas with git using git/2.30.0.

Basic authentication using a password to Git is deprecated and will soon no longer work. Visit https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/ for more information around suggested workarounds and removal dates.

Thanks,
The GitHub Team