Chapter 4 Exploring data

ds4psy: (4) Exploring data

This chapter introduces the notion of exploratory data analysis (EDA) which is a key component of data science and an important pre-requisite for statistics. Rather than introducing new R packages, we will now combine what we have learned about the ggplot2 and dplyr packages in Chapters 2 and 3). As you will see, your recently acquired skills in data transformation and visualization provide a powerful foundation to explore data by asking and answering questions. Ideally, you will soon experience that actively engaging in EDA really feels like “doing research”, as it requires us to question and scrutinize data and follow up on the answers and new hypotheses that we gather along the way.

Exploratory data analysis (EDA) is the final chapter in this part.<br>Here, we use the **dplyr** and **ggplot2** packages to explore and understand data.

Figure 4.1: Exploratory data analysis (EDA) is the final chapter in this part.
Here, we use the dplyr and ggplot2 packages to explore and understand data.

While any actual EDA is tailored to specific features of a dataset and the current research goals, this session highlights some common themes that are relevant in most cases. We illustrate the essential steps of an EDA by exploring a real dataset from clincial psychology (Woodworth et al., 2018), which measures the effects of positive psychology interventions (see Section B.1 of Appendix B for details on the data).


Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2018). Data from “Web-based positive psychology interventions: A reexamination of effectiveness”. Journal of Open Psychology Data, 6(1).