4.1 Exploratory Data Analysis Checklist: A Case Study

In this section we will run through an informal “checklist” of things to do when embarking on an exploratory data analysis. As a running example I will use a dataset on hourly ozone levels in the United States for the year 2014. The elements of the checklist are

  1. Formulate your question

  2. Read in your data

  3. Check the packaging

  4. Look at the top and the bottom of your data

  5. Check your “n”s

  6. Validate with at least one external data source

  7. Make a plot

  8. Try the easy solution first

  9. Follow up

Throughout this example we will depict an ongoing analysis with R code and real data. Some of the examples and recommendations here will be specific to the R statistical analysis environment, but most should be applicable to any software system. Being fluent in R is not necessary for understanding the main ideas of the example. Feel free to skip over the code sections.