6.3 Summary

ds4psy: (6) Importing data

Importing data is a mundane step that we normally take for granted. However, as it is a prerequisite for anything that follows, it is important to get this step right. This chapter discussed essential commands of the readr package (Wickham, Hester, et al., 2024) that allow reading and writing data from a variety of files and formats.

After working through this chapter, you are able to:

  1. orient yourself on your computer (i.e., know your working directory and specify absolute and relative paths to other directories);
  2. use readr commands to
    • parse vectors of various data types;
    • import files of various formats;
    • export files in various formats;
  3. avoid exotic or proprietary file formats (not only in R).

Although mastering the steps and commands of this chapter will cover the most typical cases, keep in mind that there are many other forms and types of data. Perhaps the most common type of data to which we are exposed on a daily basis is text, but this usually does not come neatly packaged into rectangular form. The stringr package (Wickham, 2022) is the tidyverse instrument to start playing with strings.47 Pointers to resources for importing additional data formats are provided in Sections 6.2.4 and 6.5.

As a summary, Figure 6.2 provides a schematic overview of the roles of the tidyverse packages tibble (discussed in Chapter 5 on Tibbles) and readr (discussed in this chapter) for creating tibbles. Both packages provide tools for creating tibbles, but differ in their input sources. Existing tibbles can be used for data analysis (e.g., for data transformations, visualizations, or statistics) or can be written to files (e.g., to be archived or shared).

The readr and tibble packages use different inputs to create a tabular data structure known as a tibble, which is a simpler data frame. Tibbles can then be used for data analysis in R (e.g., for data transformation, visualization, or statistics) or written to a file (e.g., for archival or sharing purposes).

Figure 6.2: The readr and tibble packages use different inputs to create a tabular data structure known as a tibble, which is a simpler data frame. Tibbles can then be used for data analysis in R (e.g., for data transformation, visualization, or statistics) or written to a file (e.g., for archival or sharing purposes).

Finally, see the Posit cheatsheets on importing and exporting data to check which readr commands you are now familiar with and which others you can still discover in the future:

The [Posit cheatsheets](https://posit.co/resources/cheatsheets/) on importing and exporting data with the **readr** package.

Figure 6.3: The Posit cheatsheets on importing and exporting data with the readr package.

Let’s test our knowledge about data import and parsing skills by completing the following exercises.

References

Wickham, H. (2022). stringr: Simple, consistent wrappers for common string operations. Retrieved from https://stringr.tidyverse.org
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz
Wickham, H., Hester, J., & Bryan, J. (2024). readr: Read rectangular text data. Retrieved from https://CRAN.R-project.org/package=readr

  1. See the Chapter 14 on Strings of r4ds (Wickham & Grolemund, 2017) for an introduction into parsing strings of text.↩︎