3.1 Setup & Reading in Data

We are going to examine a number of different questions surrounding the Sustainable Development Goals in this session.

The first thing we are going to do is make sure we have the packages we need installed and the libraries loaded. So we set up a chunk of code to ensure the tidyverse is loaded.

#--- Load libraries
library(tidyverse)

To start investigating our question, we need a dataset. Most data comes to you in a csv (comma separated values) file from Excel - we will talk about other formats later. This dataset, sdg.csv, should’ve come with the downloaded zip file - make sure that it is located in the same folder as this R file.

We use a simple function to read in data: read.csv().

#--- Read in the data
sdg <- read.csv("./sdg.csv")

#--- This will only work on my computer
sdg <- read.csv("E:/LSHTM/Teaching/r4epi/sdg.csv")

#--- Get working directory
getwd()
## [1] "E:/LSHTM/Teaching/r4epi"

Let’s unpack a couple of things from this code. A key idea in R is that objects (like a dataset) are assigned to particular names. We read in the data and assigned it the name ‘sdg’ because it is a collection of data on a variety of Sustainable Development Goal subtargets. We did the assignment using the ‘<-’ arrow. The function read.csv() took one argument, where we told R which file to read in.

Note that we did not specify the full path to the data. We use the “.” to tell R to look in the same directory where the R file you are working in is saved. We can view what directory this is with getwd() – wd stands for “working directory”. In general, you should never have to change the working directory explicitly - just make sure you save your data in the same place as your analysis file.

When you read in the data, you should’ve seen a new object in the top right Environment pane.

EXERCISE: How many rows does the dataset have? What does each row represent? How many columns does the dataset have? What does each column represent?