Part 4: Wrangling data
Data wrangling is an umbrella term for many different ways in which data can be bent and twisted. As data usually must be transformed to understand and make sense of it, we need a range of tools to reshape and reduce data structures. In R, these tools take the form of dedicated functions and packages.
This part contains five chapters that can be grouped into three parts:
- Chapter 11 and Chapter 12 lay the foundations for getting data into R. Solving this rather mundane task involve some familiarity with directory structures and file paths, and either importing data into R, or creating particular data structures (tabular data frames or tibbles) from scratch or from other data structures. All these sub-tasks can be tackled by base R (R Core Team, 2023) functions, or by the tidyverse (Wickham et al., 2019) packages here, readr, and tibble.
- Chapter 13 and Chapter 14 introduce tools and packages for transforming data. Transforming data is the core of data wrangling and includes both reshaping and reducing data. Three popular tidyverse packages that support these tasks are magrittr, dplyr, and tidyr.
- Chapter 15 combines all that we have learned so far into a set of principles and steps for exploring data. Exploratory data analysis (EDA) has been contrasted with confirmatory data analysis (CDA), described as an attitude, and compared to detective work (e.g., Tukey, 1969, 1977). We list some corresponding principles and assemble the skills and tools introduced so far to gain some insights into a dataset.
Overall, the chapters on data wrangling provide essential foundations for statistical testing, working with special data types, or other applications.