5.2 Importing data

Importing data is one of the most important, but also most mundane steps in analyzing data. Unfortunately, anything that goes wrong at this step is likely to affect everything else that follows.

As we typically deal with tabular (or rectangular) data, the utils package of R contains a range of read.table() functions that read files into data frames from various formats. The most commonly used of these are:

  • read.csv() and read.csv2 for importing comma-separated value (csv) files

  • read.delim() and read.delim2() for importing other delimited files (e.g., using the TAB character to separate the values of different variables)

  • read.fwf() for reading fixed width format (fwf) files

The readr package of the tidyverse provides similar and additional functions for reading (or “parsing”) vectors and importing data files into a simplified type of data frame (known as a “tibble”).

Both the utils and the readr packages also provide a range of write() functions that allow exporting (and storing) data files in various formats. Importing and exporting files also assume some knowledge about how to denote paths to files or computer locations (on a local file systems or remote servers).

As these topics are covered in Chapter 6: Importing data, the rest of this section only contains some excerpts and examples. More details are available at the following sections:

5.2.1 File locations and paths

A well-organized project typically contains various (sub-)directories for storing different types of data. For instance, many projects contain dedicated sub-directories for data, images, or code files.

The fact that not all files are stored in the same directory makes it necessary to know or set one’s current working directory, as well as point to the locations of files in other directories. When working with RStudio projects, R sets a session’s original working directory to the project folder.

File paths are descriptions of locations on a computer, typically encoded as character strings. They usually need to be specified when loading a data file or linking to an image, as well as other files.

To make an R project as self-contained as possible (i.e., independent of the particular folder structure on our personal computer), all files needed in a project should be stored in the project folder or its sub-directories. When including a file from some folder, always use relative file paths to specify its location.

Key commands for getting and setting file paths in R include:

# (1) Getting and setting file path: 
getwd()  # get current (absolute) file path
wd <- getwd()  # store file path

setwd(wd)  # set current (absolute) file path


# (2) Navigating relative file paths:
setwd(".")       # "." marks current location
setwd("./data")  # move 1 level down into "data" (if "data" exists)

setwd("..")    # move 1 level upwards 
setwd("./..")  # move 1 level upwards (from current location)

# Assuming 2 sub-directories ("./code" and "./data"):
setwd("code")     # move down into directory "code" 
setwd("../data")  # move into parallel directory "data"
setwd("../code")  # move into parallel directory "code"
setwd("..")       # move 1 level up

The here package (Müller, 2017) simplifies these commands, but also requires an understanding of file paths.

5.2.2 Reading and writing files

The main way to get data into R is by importing (or “reading”) data files. Doing this requires not only the existence of the file, but also knowing its storage location. Storage locations can be local (on our own computer) or remote (on some online server), with various intermediate cases (e.g., on another drive or computer on the same network).

In R, all functions that read or write files use a flexible file argument that typically describes a path to a file (as a character string), but can also specify a connection (to a server), or even literal data (as a single string or a raw vector).

Key readr functions include:

  • read_csv() vs. read_csv2() for reading comma-separated data files

  • read_delim() for reading data files not delimited by commas

  • write_csv() vs. write_csv2() for writing comma-separated data files

  • write_delim() for writing data files not delimited by commas

Dealing with problems:

References

Müller, K. (2017). here: A simpler way to find your files. https://CRAN.R-project.org/package=here
Neth, H. (2022a). Data science for psychologists. Social Psychology; Decision Sciences, University of Konstanz. https://bookdown.org/hneth/ds4psy/
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media, Inc. http://r4ds.had.co.nz