Data is the stuff of data science, the raw material that — given the right spells and concoctions of the data alchemist — is to be turned into gold. Although data is increasingly ubiquitous, it is difficult to turn raw data into something that makes sense or creates value. This should not surprise us: It also takes a lot of knowledge, skills, and effort to build a house out of a heap of dirt, stones, and wood. And while data science typically does not get our hands dirty, we should never underestimate the amount of effort and frustration involved in cleaning up some messy pile of data.
This chapter provides some background information on the main datasets used in this book and their sources. Most of the data used throughout this book is already included in R (in datasets) or provided by R packages (e.g., the tidyverse). Occasionally, we create small toy datasets to illustrate a command or technical point, but the vast majority of analyses and visualizations throughout this book use real datasets that people have collected to answer empirical questions.
To address the interests of psychologists and social scientists, we focus on people-related data, in which cases represent persons and the variables provide information about them (e.g., characteristics like age, gender, etc., but also choices, opinions, preferences, etc.). Aiming for real data that addresses scientific questions in psychology prompted us to use 2 datasets that pop up frequently throughout this book:
To make it simple to use these datasets in R, we store them in easily accessible formats on a web server (at http://rpository.com/ds4psy/). The following sections provide the context in which the data were collected and provides credit and references to the original sources. The concluding Section B.3 provides pointers to additional sources of data.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Simmons, J., Nelson, L., & Simonsohn, U. (2014). Data from paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”. Journal of Open Psychology Data, 2(1). https://doi.org/10.5334/jopd.aa
Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2017). Web-based positive psychology interventions: A reexamination of effectiveness. Journal of Clinical Psychology, 73(3), 218–232. https://doi.org/10.1002/jclp.22328
Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2018). Data from “Web-based positive psychology interventions: A reexamination of effectiveness”. Journal of Open Psychology Data, 6(1). https://doi.org/10.5334/jopd.35