Where does data (e.g., a data frame or tibble) come from? If we don’t enter it ourselves (e.g., with the
tribble() commands (see Chapter 5) we usually import it from an external source. The scope of such sources is vast and here we only cover the most common candidates: Data that is already stored in text form or other file formats that can easily be coerced into linear or rectangular data structures.
This chapter discusses options and potential pitfalls when using the readr package (Wickham, Hester, et al., 2022) for data import.
readr provides fast and friendly ways for reading vectors and rectangular data files (like
fwf) and writing files in various formats.
After working through this chapter, you will be able to:
- orient yourself on your computer (i.e., know your working directory and specify absolute and relative paths to other directories);
- use readr commands to
- parse vectors of various data types;
- import files of various formats;
- export files in various formats;
- parse vectors of various data types;
- avoid exotic or proprietary file formats (not only in R).
An alternative to using the
setwd() functions is provided by the here package (Müller, 2017), which answers the question “Where am I?” in a straightforward manner:
here determines the path to our current working directory (or project directory) when it is loaded and provides a
here() function that returns the name of this directory or other directories, whose names can be provided as additional arguments (of type character) and separated by commas:
library(here) # loads the package here() # the current directory #>  "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book" here("data") # the sub-directory "./data" #>  "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book/data" here("_book", "images") # a sub-sub-directory "./_book/images" #>  "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book/_book/images"
The one, but brilliant idea of here is that all paths within a project can be specified relative to our current working directory, which is located and determined by
As the lubridate package (covered in Chapter 10: Time) also contained a (now deprecated) function named
here(), we may have to use
here::here() to explicate that we want to use the
here() function from the here package (i.e., its corresponding namespace). If only the here package is loaded, calling
here() is sufficient.
In this chapter, we will use a variety of data files. As many of them are stored in non-standard formats,
they are not included in the ds4psy package, but stored on a web server (at http://rpository.com).
Below, we will illustrate how they can be imported directly from their online source.
Alternatively, you can use a web browser to download the files to a directory on your computer
(e.g., in a sub-directory called
data) and import them from there.
This chapter formerly assumed that you have read and worked through Chapter 11: Import data of the r4ds textbook (Wickham & Grolemund, 2017). It now can be read by itself, but reading Chapter 11: Import data of r4ds is still recommended.
Please do the following to get started:
Structure your document by inserting headings and empty lines between different parts. Here’s an example how your initial file could look:
--- : "Chapter 6: Importing data" title: "Your name" author: "2023 January 30" date: html_document output--- Add text or code chunks here. # Exercises (06: Importing data) ## Exercise 1 ## Exercise 2 etc. <!-- The end (eof). -->
Create an initial code chunk below the header of your
.Rmdfile that loads the R packages of the tidyverse (and see Section F.3.3 if you want to get rid of the messages and warnings of this chunk in your HTML output).
Save your file (e.g., as
06_import.Rmdin the R folder of your current project) and remember saving and knitting it regularly as you keep adding content to it.
Now that we can orient ourselves on our computers and navigate between various directories, we are ready to read more about reading data with readr.