This chapter first introduced the pipe operator of magrittr (Bache & Wickham, 2014) and a range of functions for transforming data from the tidyverse packages dplyr (Wickham et al., 2021) and tidyr (Wickham & Henry, 2020).
So what is the difference between dplyr and tidyr? If we view the functions of both packages as tools, the boundary between both packages is pretty arbitrary: Both packages provide functions for manipulating tables of data.
When reconsidering our distinction between transformations that reduce or reshape data (from Section 5.1), we see that tidyr mostly deals with reshaping data, whereas dplyr mostly allows on-the-fly data reductions (e.g., selections and summaries). In terms of the tasks addressed, the dplyr functions mainly serve to explicate and understand data contained in a table, whereas the tidyr functions aim to clean up data by reshaping it. In practice, most dplyr pipes reduce a complex dataset to answer a specific question. By contrast, the output of tidyr pipes typically serves as an input to a more elaborate data analysis. However, dplyr also provides functions for joining tables and tidyr can be used to select, separate, or unite variables. Thus, the functionalities of both packages are similar enough to think of them as two complementary tools out of a larger toolbox for manipulating data tables — which is why they are both part of the larger collection of packages provided by the tidyverse (Wickham et al., 2019).
Here are some pointers to related cheatsheets and additional links:
- See the cheatsheet on transforming data with dplyr from RStudio cheatsheets:
- Introduction on dplyr and pipes: The basics (by Sean C. Anderson, 2014-09-13)
- See the cheatsheet on transforming data with tidyr functions (on the back of the Data Import cheatsheet on the readr package) from RStudio cheatsheets:
For background information on the notion of tidy data, see the following paper by Hadley Wickham (2014):
- Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1–23. doi: 10.18637/jss.v059.i10 (available at https://www.jstatsoft.org/article/view/v059i10)
For a critical view, see the following blog post:
- What is “tidy data?” (by John Mount)