7.3 Summary

ds4psy: (7) Tidying data

This chapter gravitated around the fact that any non-trivial dataset can be formatted in a variety of ways and introduced the concept of tidy data (defined in Section 7.1.4), which is a fundamental notion of the tidyverse. The tidyr package (Wickham & Henry, 2020) contains essential commands that allow to separate (or unite) the values of variables and gather (or spread) values by changing a table format from wide to long format (and vice versa).

After working through this chapter, you are now able to:

  1. describe and organize the layout of data tables;
  2. define the notion of tidy data; and use tidyr commands to:
  3. separate 1 variable into the values of 2 variables;
  4. unite the values of 2 variables into 1 variable;
  5. gather values distributed over multiple columns into 1 variable;
  6. spread the values of a variable over multiple columns.

A limitation of previous tidyr commands is that they only dealt with 1 dependent variable at a time. Nevertheless, using pipes of several commands can overcome this constraint (see Section 7.2.6 for examples).

The R Studio Cheat Sheet on reshaping data with the tidyr package provides an overview over the tidyr commands you are now familiar with and lets you discover some additional ones:

The [R&nbsp;Studio Cheat Sheet](https://www.rstudio.com/resources/cheatsheets/#import) on reshaping data with the **tidyr** package<br>(on the back of the _Data Import_ cheat sheet on the **readr** package).

Figure 7.2: The R Studio Cheat Sheet on reshaping data with the tidyr package
(on the back of the Data Import cheat sheet on the readr package).

Overall, the topic of data wrangling is still under active development. The tidyr package discussed in this chapter replaced earlier packages — specifically reshape (2005–2010) and reshape2 (2010–2014) — and is still changing at this point. Thus, the gather() and spread() commands of tidyr are first steps, rather than the ultimate solution to data wrangling. In 2019, the tidyr package was being complemented by the pivot_longer() and pivot_wider() functions, as well as some unnest_ functions for taming deeply nested list (like XML data files). See the vignettes on Pivoting and Rectangling for details on these developments.

But rather than waiting for further updates, we should trust that our general insights will still be valuable in the future and test our skills in tidying data by completing the following exercises.

References

Wickham, H., & Henry, L. (2020). tidyr: Easily tidy data with ’spread()’ and ’gather()’ functions. Retrieved from https://CRAN.R-project.org/package=tidyr