4 Tutorial 4: Reading data in/out
After working through Tutorial 4, you’ll…
- understand how to get data into and out of R
For this tutorial, we’ll again use the data set “data_tutorial3.csv” (via OLAT/Materials/Data for R). The data set has already been introduced and explained in Tutorial 3: Objects & structures in R, so I won’t go into detail here.
<- read.csv2("data_tutorial3.csv", header = TRUE)survey
One of the most frequently encountered external data types you’ll have to get into R are comma-separated values files, or short, CSV files. You may know CSV files from Excel - oftentimes, such data consists of observations (in rows) and variables (in columns). Values are separated by a separator (oftentimes a comma or a semicolon, depending on your data).
4.1 Getting data into R
This tutorial doesn’t teach you that much new. In fact, we have already encountered a CSV file when reading in data for tutorial 3.
What does this command do? Let’s see:
Image: Help for the read.csv2 function
The read.csv2() function is part of the utils package. To read in CSV files, two different functions exist:
- read.csv() reads in CSV files where values are comma separated
- read.csv2() reads in CSV files where values are semicolon separated
Other than that, both functions work the same way and consist of the same arguments:
- file: Here, you need to identify the name of the CSV files that should be read in (including the file extention, here .csv). The file should be located in your working directory, otherwise R will not be able to identify it: file = “data_tutorial3.csv”
- header: This argument specifies whether or not the first row of the CSV file contains the name of variables. It is automatically set to true - thus, if our data wouldn’t contain the name of each variable in its first row, we would need to set this argument to FALSE. In our case, we could either ignore this argument (since it is automatically set to TRUE) or actively set it to TRUE. Both leads to the same result.
Important: R differentiates between necessary (marked in red) and obligatory (marked in green) arguments.
Arguments where no default value is given (i.e., those without a =) are necessary. That means you have to specify it once you call the function by passing respective values to the function. For instance, you cannot read in a CSV file without specifying the argument file - R would not even know which file to read in in that case.
However, arguments were are default value is given can be ignored. If you do not specify values for these arguments yourself, R will simply take the default value. For instance, the read.csv2() function will automatically use the first row of a CSV file as column names, unless you actively set the argument header to FALSE.
Another important thing to know is that you can specify arguments in functions in two ways:
- explicitly by name, for instance by setting file equal to data_tutorial3.csv
- implicitly by order, for instance by passing data_tutorial3.csv as the first argument to the read.csv2() function
In fact, the following two commands will give the exact same results:
<- read.csv2("x = data_tutorial3.csv")survey
4.2 Getting data out of R
In some cases, you may want to also get data out of R - for instance, export a newly created data set as a CSV file.
You can easily do that using the write.csv() or the write.csv2() function by specifying the arguments x (i.e., which object should be exported) and file (i.e., how the exported file should be named).
4.3 Other packages for getting data into/out of R
When working with R, there will be many other data types you may wish to import or export. While we won’t cover all of them here, the following packages deliver some very useful additional functions for doing so
4.4 Take Aways
- Importing data: read.csv(), read.csv2()
- Exporting data: write.csv(), write.csv2()
4.5 More tutorials on this
You still have questions? The following tutorials & papers can help you with that:
Let’s keep going: Tutorial 5: Inspecting & transforming objects.