4.5 Resources

Here are some helpful pointers on the attitude and process of EDA in R (R Core Team, 2021) and the tidyverse (Wickham, 2019c) (from more general to more specific):

4.5.1 EDA

Background readings

Key references on EDA include the books and papers by John W. Tukey (Tukey, 1969, 1977, 1980). An insightful overview is provided by Behrens (1997).

The broader implications of two distinct cultures in the use of statistical modeling are discussed by Breiman (2001) and the historical background and perspectives on data science are discussed by Donoho (2017).

Specific sources

Links to specific resources:

R packages

For practical advice on the tidyverse packages dplyr and ggplot2:

There are many other R packages that may become useful the context of EDA. Examples include:

  • the codebook package automates the description of data frames

  • the dlookr package supports various tasks of data diagnosis, exploration, and transformation (including visualizations of missing data or outliers)

  • the sjmisc package provides miscellaneous utility functions, supporting data transformation tasks like recoding, dichotomizing or grouping variables

4.5.2 Visualization

Background readings

Given the variety of options, it is often difficult to decide when to use which type of plot (or geom). The landmark publications by Jacques Bertin (e.g., Bertin, 2011) and Edward R. Tufte (Tufte, 2001, 2006; Tufte, Goeler, & Benson, 1990) provide solid advice and many inspiring examples.

Specific sources

More specific resources on the principles of data visualization (with many beautiful or bizarre examples) include:

More recent publications that are geared to the needs of aspiring data scientists include:

4.5.3 Miscellaneous links

The Simply Statistics blog (by Rafa Irizarry, Roger Peng, and Jeff Leek) provides many insightful and inspiring articles. For instance, the following posts relate to EDA:

Know your data. Really, really, know it: An article (by Randy Au, 2019-02-15) on what “knowing your data” means in applied contexts


[04_explore.Rmd updated on 2021-06-15 15:18:55 by hn.]