Session 5 Getting data in and out of R
We can’t get very far with R if we can’t get our raw data in, and our results out. This session introduces some of the basic techniques to do this.
5.1 Working directory
R can read data from files on disk. First, we need to understand where R will look for data. Running the command
getwd() in the R console will reveal a file path, which is where R will look first for files. This is also likely to be the directory displayed in the file pane (bottom right hand corner in R studio). This directory is called the working directory.
You can customise the working directory, using Session > Set working directory, and the options therein. Alternatively, you can use the
setwd() command in the console. You can supply a file path, e.g.
More on directories when we introduce Projects later. For now, note that any files will need to be placed in the working directory for R to find them (or in a subdirectory).
In general, it makes sense to also save your script files in the working directory (or a subdirectory of it).
Exercise: choose a sensible working directory, and save your current script file to that directory.
5.2 Base R solutions to importing data
Note: ‘Base R’ refers to the functions that come with R as standard, without installing additional packages.
A simple base R solution to reading in data is to use the
read.csv function. This reads data from the csv file format. Note that simple Excel spreadsheets can be saved in this format.
file.csv is placed in the working directory, it can be read into R using
x <- read.csv("file.csv"), which reads in the file and assigns it to
x. By default
x will be a data frame.
Note that you can also place files into a subdirectory of the working directory. For example, it may be convenient to have a
data subfolder, in which case you would use
x <- read.csv("data/file.csv").
Exercise, click on the link to download the file CHD.csv. To download this file, right click on the hyperlink and select ‘save link as’. Download or move it to your working directory, and read it into an R object called `chd’.
5.3 Exporting data
Similarly, you can export data back out (in an appropriate format), using
write.csv. Here you need to specify both the object to write, and the file to write it to, e.g.
Exercise: write the object
chd back to disk, in a new file called
5.4 Further reading
There are many, many more options for reading and writing data, here are some resources to find out more:
Chapter 11 of R for Data Science introduces the ‘tidyr’ package (part of the Tidyverse).
Chapters 5, 6, 7, 8 of R Programming for Data Science cover options, including handling larger files, and reading data directly from websites.
For further technical details see R data import/export manual which also includes details of connecting to databases.