## 6.1 Introduction

Where does data (e.g., a data frame or tibble) come from? If we don’t enter it ourselves (e.g., with the tibble or tribble commands (see Chapter 5 on tibbles) we usually import it from an external source. The scope of such sources is vast and here we only cover the most common candidates: Data that is already stored in text form or other file formats that can easily be coerced into linear or rectangular data structures.

This chapter discusses options and potential pitfalls when using the readr package (Wickham et al., 2018) for data import. readr provides fast and friendly ways for reading vectors and rectangular data files (like csv, tsv, and fwf) and writing files in various formats.

### 6.1.1 Objectives

After working through this chapter, you will be able to:

1. orient yourself on your computer (i.e., know your working directory and specify absolute and relative paths to other directories);
• parse vectors of various data types;
• import files of various formats;
• export files in various formats;
3. avoid exotic or proprietary file formats (not only in R).

An important pre-requisite to loading data is that we are able to orient ourselves on our computer and can navigate or point to the location at which data files may be stored. The 2 key questions to ask and answer prior to reading or writing data are:

• Where am I?
• Where is my data?

#### Working directory

The 1st questions (“Where am I?”) addresses the notion of the current working directory. This is typically the directory on your computer in which you started your R environment, the location of your current R script, or — if you’re working with R Studio projects — the home directory of your current project. Use the function getwd to find out your current working directory:

# Get current working directory (wd):
my_wd <- getwd()
my_wd
#> [1] "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book"

Note that getwd function returns a character string and — depending on your operating system — uses either forward slashes (/) or backward slashes (\) to separate the hierarchy levels of different directories. This character string represents the address of your current working directory.

Corresponding to getwd, the function setwd (with its only argument dir specifying a string that points to an existing location on your computer) allows changing your current working directory to dir:

# Set current working directory:
setwd(dir = my_wd)  # set dir to my_wd (set above)
getwd()             # same dir (as set to my_wd)
#> [1] "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book"

And list.files() provides a list of all files and directories in the current working directory:

# List files and directories:
list.files()       # in current working directory
list.files(my_wd)  # in some specific directory

#### File paths

The 2nd question (“Where is my data?”) implies that data doesn’t necessarily need to be stored at the same location as our current working directory. Let’s suppose that we want to load some data file (called data_t1.csv), which we downloaded from an online source at http://rpository.com/ds4psy/data/data_t1.csv. But rather than saving it in our current working directory my_wd, the file data_t1.csv is stored in some parallel directory (called data). In this case, there are 2 principle ways of loading your data file:

1. Change your current working directory to a different directory that contains the data:
# (1) Changing working directory to load data:

# Get working directory:
my_wd <- getwd()
my_wd  # prints the current wd:
#> [1] "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book"

# Change the current working directory to the "data" subdirectory:
# setwd("/Users/hn/Dropbox/GitHub/ds4psy_book/data")  # absolute path
setwd("./data")  # relative to current directory "."
getwd()  # verify new location:
#> [1] "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book/data"

# Read data from the NEW working directory:

setwd(my_wd)  # setwd to original directory
getwd()       # back in my_wd
#> [1] "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book"
1. Stay at your current working directory, but read data from a different directory:
# (2) Reading data from another directory:

# (a) provide absolute/full path of the data file:
t2 <- read_csv("./data/data_t1.csv")  # relative path

# (b) provide relative path of the data file:

# (c) relative to (platform dependent) home directory:
t4 <- t3

# (d) provide path to an online source of the data file:
t0 <- read_csv("http://rpository.com/ds4psy/data/data_t1.csv")

The 2nd method — staying where you are, but importing files from other directories — is typically preferred. Nevertheless, your options of changing your working directory and pointing to different directories allow for a variety of ways for pointing to the location of a data file. But let’s verify that all of the above methods actually loaded the same data:

# Check whether t0 to t4 are all equal:
all.equal(t0, t1) &
all.equal(t1, t2) &
all.equal(t2, t3) &
all.equal(t3, t4)
#> [1] TRUE

#### Sharing scripts and data files

To share your R scripts and data files with others it is best to work with an R Studio project and store all related scripts and files within this project.27 In R projects, it is customary to save all your R scripts in a specific directory (e.g., called R) and store all data files in a dedicated data directory (e.g., data). Importantly, using only relative file paths (i.e., relative to the current script or to the project’s working directory) also allows archiving or transferring your scripts and data files, as long as the directory structure remains intact. By archiving your entire project folder (e.g., as my_project.zip, or a folder that includes the subfolders R and data), you can transfer your archive to another person or computer and your scripts will keep working.

### 6.1.3 Being here

A modern alternative to using the getwd() and setwd() functions is provided by the here package (Müller, 2017), which answers the question “Where am I?” in a straightforward manner: You are here(). here determines the path to your current working directory (or project directory) when it is loaded and provides a here() function that returns the name of this directory or other directories, whose names are provided as additional arguments (of type character):

library(here)  # loads the package

here::here()         # returns your current main directory
#> [1] "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book"
here::here("data")   # returns the subdirectory "/data"
#> [1] "/Users/hneth/Desktop/stuff/Dropbox/GitHub/ds4psy_book/data"

The brilliant idea of here is that all paths within a project can be specified relative to your current working directory, which is here().

Note: As the R package lubridate also contains a (deprecated) function named here() (see Chapter 10: Time), we are using here::here() here to explicate that we want to use the function from the here package.

### 6.1.4 Data used

In this chapter, we will use a variety of data files. As many of them are stored in non-standard formats, they are not included in the ds4psy package, but stored on a web server (at http://rpository.com). Below, we will illustrate how they can be imported directly from their online source. Alternatively, you can use a web browser to download the files to a directory on your computer (e.g., in a sub-directory called data) and import them from there.

This chapter formerly assumed that you have read and worked through Chapter 11: Import data of the r4ds textbook (Wickham & Grolemund, 2017). It now can be read by itself, but reading Chapter 11: Import data of r4ds is still recommended.

Please do the following to get started:

• Create an R Markdown (.Rmd) document (for instructions, see Appendix E and the templates linked in Section E.2).

• Structure your document by inserting headings and empty lines between different parts. Here’s an example how your initial file could look:

---
title: "Chapter 6: Importing data"
date: "2020 May 25"
output: html_document
---

Add text or code chunks here.

# Exercises (06: Importing data)

## Exercise 1

## Exercise 2

etc.

<!-- The end (eof). -->
• Create an initial code chunk below the header of your .Rmd file that loads the R packages of the tidyverse (and see Section E.3.3 if you want to get rid of the messages and warnings of this chunk in your HTML output).

• Save your file (e.g., as 06_import.Rmd in the R folder of your current project) and remember saving and knitting it regularly as you keep adding content to it.