7.1 Introduction

Contrasting exploratory data analysis (EDA) with confirmatory data analysis.

Please note: This page merely summarizes the longer version at 4.1 Introduction (Neth, 2021a).

7.1.1 What is EDA?

Distinguish between three main discussions and interpretations:

  1. EDA as the opposite of (or complement to) confirmatory data analysis:

Exploratory data analysis (EDA) is a type of data analysis that John Tukey contrasted with confirmatory data analysis (CDA) (e.g., Tukey, 1977, 1980) and likened to the work of a detective (Tukey, 1969). Long before the ubiquity of personal computers, Tukey emphasized the importance of visual displays for detecting patterns or irregularities in data, while most psychologists of the same era were obsessed with statistical rituals (like null hypothesis significance testing, NHST, see Nickerson, 2000) of a rather mechanistic and mindless nature (Gigerenzer, 2004). Irrespective of your stance towards statistics, EDA approaches data in a more curious and open-minded fashion.

  1. EDA as an attitude or mindset.

Exploratory data analysis is an attitude,
a flexibility, and a reliance on display,
NOT a bundle of techniques, and should be so taught.

John W. Tukey (1980, p. 23)

Philosophical idea of gaining insight by hermeneutics: EDA is a data scientist’s way of doing hermeneutics — see the corresponding definitions in Wikipedia or The Stanford Encyclopedia of Philosophy for details — to get a grip on some dataset.

  1. EDA as an inevitable process to familiarize us with new data, detect potential problems and irregularities, discover patterns, and formulate better questions.

The goal of EDA is to discover patterns in data. (…)
The role of the data analyst is to listen to the data in as many ways as possible
until a plausible “story” of the data is apparent, even if such a description
would not be borne out in subsequent samples.

John T. Behrens (1997, p. 132)

If a good exploratory technique gives you more data, then maybe
good exploratory data analysis gives you more questions, or better questions.
More refined, more focused, and with a sharper point.

Roger Peng (2019), Simply Statistics

Having learned what exploratory data analysis (EDA) is and wants, we are curious to learn how to do an EDA.

References

Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2(2), 131–160. https://doi.org/10.1037/1082-989X.2.2.131
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587–606. https://doi.org/10.1016/j.socec.2004.09.033
Neth, H. (2021a). Data science for psychologists. Social Psychology; Decision Sciences, University of Konstanz. https://bookdown.org/hneth/ds4psy/
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. https://doi.org/10.1037/1082-989X.5.2.241
Tukey, J. W. (1969). Analyzing data: Sanctification or detective work. American Psychologist, 2, 83–91. https://doi.org/10.1037/h0027108
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1), 23–25. https://www.jstor.org/stable/2682991