Part 4: Wrangling data

Data wrangling is an umbrella term for many different ways in which data can be bent and twisted. As data usually must be aggregated and transformed to understand and make sense of it, we need a range of tools to select, combine, and reshape data structures. In R, these tools take the form of dedicated functions and packages.

This part contains five chapters that can be grouped into three parts:

  • Chapter 11 and Chapter 12 lay the foundations for getting data into R. Solving this rather mundane task involve some familiarity with directory structures and file paths, and either importing data into R, or creating particular data structures (tabular data frames or tibbles) from scratch or from other data structures. All these sub-tasks can be tackled by base R (R Core Team, 2023) functions, or by the tidyverse (Wickham et al., 2019) packages here, readr, and tibble.
  • Chapter 13 and Chapter 14 introduce packages and tools for transforming data. Transforming data is the core of data wrangling and includes both reshaping and reducing data. Three popular tidyverse packages that support these tasks are magrittr, dplyr, and tidyr.
  • Chapter 15 combines all that we have learned so far into a set of principles and steps for exploring data. John Tukey contrasted exploratory data analysis (EDA) with confirmatory data analysis (CDA) and described it as an attitude and similar to detective work. We list some corresponding principles and gather all skills and tools introduced so far to gain some insights into a dataset.

Overall, the chapters on data wrangling provide essential foundations for statistical testing, working with special data types, or other applications.