4.1 Exploratory Data Analysis Checklist: A Case Study
In this section we will run through an informal “checklist” of things to do when embarking on an exploratory data analysis. As a running example I will use a dataset on hourly ozone levels in the United States for the year 2014. The elements of the checklist are
Formulate your question
Read in your data
Check the packaging
Look at the top and the bottom of your data
Check your “n”s
Validate with at least one external data source
Make a plot
Try the easy solution first
Follow up
Throughout this example we will depict an ongoing analysis with R code and real data. Some of the examples and recommendations here will be specific to the R statistical analysis environment, but most should be applicable to any software system. Being fluent in R is not necessary for understanding the main ideas of the example. Feel free to skip over the code sections.