7.5 Resources

This section provides some pointers to additional resources on tidy data.

7.5.1 Help on tidying data

Read Chapter 12: Tidy data in the r4ds textbook (Wickham & Grolemund, 2017).

For additional details on the tidyr package (Wickham & Henry, 2020):

study the vignette("tidy-data") and vignette("pivot"), as well as the documentations of ?spread, ?gather, ?seperate, ?unite, etc.;
study https://tidyr.tidyverse.org. and its examples, as well as the discussions at https://community.rstudio.com/tags/tidyr;
study the RStudio cheatsheet on reshaping data with the tidyr package (on the back of the Data Import cheatsheet):

Figure 7.3: The RStudio cheatsheet on reshaping data with tidyr functions (on the back of the Data Import cheatsheet on the readr package).

7.5.2 Miscellaneous

For background information on the notion of tidy data, see the following paper by Hadley Wickham (2014b):

Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1–23. doi: 10.18637/jss.v059.i10 (available at https://www.jstatsoft.org/article/view/v059i10)

For a critical view, see the following blog post:

What is “tidy data?” (by John Mount)

The section Related work on https://tidyr.tidyverse.org provides some historical notes (e.g., on the relation between tidyr and reshape), pointers on terminology between different frameworks (e.g., SQL), and recommends the following papers:

Wrangler: Interactive visual specification of data transformation scripts
An interactive framework for data cleaning (Potter’s wheel)

A powerful alternative framework to data cleaning and wrangling is provided by the data.table package (Dowle & Srinivasan, 2021).

See https://rdatatable.gitlab.io/data.table/ and the documentation to get started.

Check out Wikipedia: Tidy data for additional details and links.

For animated images of common data transformations (e.g., by using spread() and gather()), see

gadenbuie/tidyexplain at GitHub.com

7.5.3 Outlook

The commands of tidyr are first steps, rather than the ultimate solution to data wrangling. This area is currently under active development and only the future will show which framework will ultimately be adopted. And rather than despairing about technological changes, we all should feel happy — as in the Chinese proverb — to live in interesting times…

ds4psy

[07_tidy.Rmd updated on 2022-07-15 18:31:57 by hn.]

References

Dowle, M., & Srinivasan, A. (2021). data.table: Extension of ‘data.frame‘. Retrieved from https://CRAN.R-project.org/package=data.table

Wickham, H. (2014b). Tidy data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10

Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz

Wickham, H., & Henry, L. (2020). tidyr: Tidy messy data. Retrieved from https://CRAN.R-project.org/package=tidyr