Part 4: Wrangling data

Data wrangling is an umbrella term for many different ways in which data can be bent and twisted. As data usually must be transformed to understand and make sense of it, we need a range of tools to reshape and reduce data structures. In R, these tools take the form of dedicated functions and packages.

This part contains five chapters that can be grouped into three parts:

Chapter 11 and Chapter 12 lay the foundations for getting data into R. Solving this rather mundane task involve some familiarity with directory structures and file paths, and either importing data into R, or creating particular data structures (tabular data frames or tibbles) from scratch or from other data structures. All these sub-tasks can be tackled by base R (R Core Team, 2025b) functions, or by the tidyverse (Wickham et al., 2019) packages here, readr, and tibble.

Chapter 13 and Chapter 14 introduce tools and packages for transforming data. Transforming data is the core of data wrangling and includes both reshaping and reducing data. Three popular tidyverse packages that support these tasks are magrittr, dplyr, and tidyr.

Chapter 15 combines all that we have learned so far into a set of principles and steps for exploring data. Exploratory data analysis (EDA) has been contrasted with confirmatory data analysis (CDA), described as an attitude, and compared to detective work (e.g., Tukey, 1969, 1977). We list some corresponding principles and assemble the skills and tools introduced so far to gain some insights into a set of data.

Overall, our skills of data wrangling are often eclipsed by our ability for creating visualizations or statistical models. Nevertheless, our ability to competently import, transform, and explore data are a crucial pre-requisite for all the more visible fruits of our efforts. Not only is wrangling data the single most important skill of data scientists, but it also is the part on which we spend most of our time. Thus, the chapters on data wrangling provide essential foundations for statistical testing, working with special data types, or other applications.

10 Using colors

11 Importing data