As any exploration depends on the data to be explored, there are no general recipes for EDA. However, we can collect a set of good practices and recommendations.
7.2.1 The principles of EDA
As a quick summary, here are the 10 principles of EDA mentioned above:
Start with a clean slate and explicitly load all data and all required packages.
Structure, document, and comment your analysis.
Make copies (and copies of copies) of your data.
Know your data (variables and observations).
Know and deal with unusual variables and unusual values.
Inspect the distributions of variables.
Use filter variables to identify and select sub-sets of observations.
Inspect relationships between variables.
Inspect trends over time or repeated measurements.
Create graphs that convey their messages as clearly as possible.
In Section 4.2 Essentials of EDA, these princpiles are illustrated in the context of some data collected in an investigation on the benefits of positive psychology interventions (described in Appendix B1).
7.2.2 Caveat: Explaining vs. predicting in science
Discovering some pattern in data is usually interesting and exciting. However, me must be very careful to avoid drawing premature conclusions from it. Importantly, any observed relationship between variables could be spurious, merely due to chance fluctuations. Hence, before getting carried away by discovering some pattern in our data, we must always keep in mind:
- Science 101: To really find something, we need to predict it — and ideally replicate it under different conditions.
Thus, taking into account the principles of EDA does never guarantee any results, but provides valuable insights into the structure and contents of a dataset. Gaining such insights before embarking on statistical tests minimizes the risk of missing something important or violating key assumptions. However, EDA does not replace solid research design and sound practices of inferential statistics.