This chapter introduced the pipe operator of magrittr (Bache & Wickham, 2014) and a range of functions for manipulating tables from the tidyverse packages dplyr (Wickham et al., 2021) and tidyr (Wickham & Henry, 2020).
So what is the difference between dplyr and tidyr? If we view the functions of both packages as tools, the boundary between both packages is pretty arbitrary. Both packages provide functions for manipulating tables of data. In terms of tasks, the dplyr functions mainly aim to explicate and summarize the data contained in a table, whereas the tidyr functions aim to shape or reshape the table. However, dplyr also provides tools for joining multiple tables and tidyr can be used to select, separate, or unite variables of a table. Thus, the functionalities of both packages are similar enough to think of them as elements of a large toolbox for manipulating data tables — which is why they were covered in a single chapter here.
Here are some pointers to cheatsheets and additional links:
- See the cheatsheet on transforming data with dplyr from RStudio cheatsheets:
- Introduction on dplyr and pipes: The basics (by Sean C. Anderson, 2014-09-13)
- See the cheatsheet on transforming data with tidyr functions (on the back of the Data Import cheatsheet on the readr package) from RStudio cheatsheets:
For background information on the notion of tidy data, see the following paper by Hadley Wickham (2014b):
- Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1–23. doi: 10.18637/jss.v059.i10 (available at https://www.jstatsoft.org/article/view/v059i10)
For a critical view, see the following blog post:
- What is “tidy data?” (by John Mount)