What is exploratory data analysis (EDA)?
Three possible interpretations are:
A complement to confirmatory data analysis
an attitude or mindset (allowing for insights based on hermeneutics)
an inevitable process to familiarize us with new data, detect patterns, and formulate better questions.
Instead of a fixed recipe, we collected 10 principles of EDA:
Start with a clean slate and explicitly load all data and all required packages.
Structure, document, and comment your analysis.
Make copies (and copies of copies) of your data.
Know your data (variables and observations).
Know and deal with unusual variables and unusual values.
Inspect the distributions of variables.
Use filter variables to identify and select sub-sets of observations.
Inspect relationships between variables.
Inspect trends over time or repeated measurements.
Create graphs that convey their messages as clearly as possible.
From a technical viewpoint, EDA involves a combination of base R data structures and commands, and is facilitated by additional tools from the dplyr, tidyr, and ggplot packages.
See the pointers to related resources at Section 4.5 Resources.
- Know your data. Really, really, know it (by Randy Au, 2019-02-15)
This concludes the current part (getting, transforming, and exploring data).
We will study specific data types (e.g., text and date-time data) next.