Chapter 4 Data pipelines

4.1 Retrieving data

4.1.1 From local files

4.1.2 From APIs

4.1.3 From databases

Instructions for database access

  1. Install DBeaver (unless you already have desktop database client software you prefer): Here is a link to download for both Windows and Mac OS X. We will be using the community edition 7.0.0. for accessing PostgreSQL databases.

The Community Edition is free. Note that DBeaver uses and requires Java. If you install it via the Windows or MacOS installer then you don’t need to install Java separately.

  1. Download and install Cisco Mobility VPN client: See instructions here.

After launch, select the “UVA Anywhere” network.

You need to use the VPN if and only if you are access the database from off-Grounds. When you are on-Grounds, skip this step.

  1. Log into Postgres DB:

    Host: va-energy2.postgres.database.azure.com Username: [your UVA id](???) Password: [Contact Chloe Fauvel Chloe to get your individual password] Port: 5432 Initial database: postgres

Please don’t share your individual credentials.

4.1.4 From web resources

4.2 Data types

4.2.1 Data types in R

4.2.2 Conversion on read-in

4.3 Data wrangling

To learn how to wrangle and visualize data using the Tidyverse packages, you may find it useful to go through the Tidyverse Fundamentals with R modules on Datacamp. - Datacamp also offers a range of other learning modules for developing data science skills in R.

4.3.1 Tidy data

4.3.2 Dplyr

4.4 Managing data

4.4.1 DOs and DON’Ts