Session 5 Getting data in and out of R
We can’t get very far with R if we can’t get our raw data in, and our results out. This session introduces some of the basic techniques to do this.
5.1 Working directory
R can read data from files on disk. First, we need to understand where R will look for data. Running the command getwd()
in the R console will reveal a file path, which is where R will look first for files. This is also likely to be the directory displayed in the file pane (bottom right hand corner in R studio). This directory is called the working directory.
You can customise the working directory, using Session > Set working directory, and the options therein. Alternatively, you can use the setwd()
command in the console. You can supply a file path, e.g. setwd("C:/my/file/path")
.
More on directories when we introduce Projects later. For now, note that any files will need to be placed in the working directory for R to find them (or in a subdirectory).
In general, it makes sense to also save your script files in the working directory (or a subdirectory of it).
Exercise: choose a sensible working directory, and save your current script file to that directory.
5.2 Base R solutions to importing data
Note: ‘Base R’ refers to the functions that come with R as standard, without installing additional packages.
A simple base R solution to reading in data is to use the read.csv
function. This reads data from the csv file format. Note that simple Excel spreadsheets can be saved in this format.
Once file.csv
is placed in the working directory, it can be read into R using
x <- read.csv("file.csv")
, which reads in the file and assigns it to x
. By default x
will be a data frame.
Note that you can also place files into a subdirectory of the working directory. For example, it may be convenient to have a data
subfolder, in which case you would use x <- read.csv("data/file.csv")
.
Exercise, click on the link to download the file CHD.csv. To download this file, right click on the hyperlink and select ‘save link as.’ Download or move it to your working directory, and read it into an R object called `chd’.
5.3 Exporting data
Similarly, you can export data back out (in an appropriate format), using write.csv
. Here you need to specify both the object to write, and the file to write it to, e.g. write.csv(x, "file.csv")
.
Exercise: write the object chd
back to disk, in a new file called chd2.csv
’.
5.4 Further reading
There are many, many more options for reading and writing data, here are some resources to find out more:
Chapter 11 of R for Data Science introduces the ‘tidyr’ package (part of the Tidyverse).
Chapters 5, 6, 7, 8 of R Programming for Data Science cover options, including handling larger files, and reading data directly from websites.
For further technical details see R data import/export manual which also includes details of connecting to databases.