5.5 Conclusion

5.5.1 Summary

This chapter first introduced the pipe operator of magrittr (Bache & Wickham, 2014) and a range of functions for transforming data from the tidyverse packages dplyr (Wickham et al., 2021) and tidyr (Wickham & Henry, 2020).

So what is the difference between dplyr and tidyr? If we view the functions of both packages as tools, the boundary between both packages is pretty arbitrary: Both packages provide functions for manipulating tables of data.

When reconsidering our distinction between transformations that reduce or reshape data (from Section 5.1), we see that tidyr mostly deals with reshaping data, whereas dplyr mostly allows on-the-fly data reductions (e.g., selections and summaries). In terms of the tasks addressed, the dplyr functions mainly serve to explicate and understand data contained in a table, whereas the tidyr functions aim to clean up data by reshaping it. In practice, most dplyr pipes reduce a complex dataset to answer a specific question. By contrast, the output of tidyr pipes typically serves as an input to a more elaborate data analysis. However, dplyr also provides functions for joining tables and tidyr can be used to select, separate, or unite variables. Thus, the functionalities of both packages are similar enough to think of them as two complementary tools out of a larger toolbox for manipulating data tables — which is why they are both part of the larger collection of packages provided by the tidyverse (Wickham et al., 2019).

5.5.2 Resources

Here are some pointers to related cheatsheets and additional links:

On dplyr

Data transformation with dplyr from RStudio cheatsheets.

Figure 5.3: Data transformation with dplyr from RStudio cheatsheets.

On tidyr

  • See the cheatsheet on transforming data with tidyr functions (on the back of the Data Import cheatsheet on the readr package) from RStudio cheatsheets:
The RStudio cheatsheet on reshaping data with tidyr functions (on the back of the Data Import cheatsheet on the readr package).

Figure 5.4: The RStudio cheatsheet on reshaping data with tidyr functions (on the back of the Data Import cheatsheet on the readr package).

  • For background information on the notion of tidy data, see the following paper by Hadley Wickham (2014):

5.5.3 Preview

We now have all ingredients in place for conducting an exploratory data analysis (EDA). Thus, Chapter 6) will be on exploring data.

References

Bache, S. M., & Wickham, H. (2014). magrittr: A forward-pipe operator for R. https://CRAN.R-project.org/package=magrittr
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: A grammar of data manipulation. https://CRAN.R-project.org/package=dplyr
Wickham, H., & Henry, L. (2020). tidyr: Tidy messy data. https://CRAN.R-project.org/package=tidyr