2.8 Resources

This section contains links to related resources.

2.8.1 Help on visualization

In addition to Chapter 3: Data visualization, Chapter 7: Exploratory data analysis (EDA) (to be covered in 2 weeks) provides further information on data visualization.

The following links provide additional information on using ggplot2:

Data visualization with **ggplot2** summary from [R Studio Cheat Sheets](https://www.rstudio.com/resources/cheatsheets/).

Figure 2.8: Data visualization with ggplot2 summary from R Studio Cheat Sheets.

Books or scripts on data visualization include:

The landmark publications by Jacques Bertin (e.g., Bertin, 2011) and Edward R. Tufte (Tufte, 2001, 2006; Tufte, Goeler, & Benson, 1990) provide solid advice and many inspiring examples.

More recent publications that are geared to the needs of aspiring data scientists include:

More specific resources on the principles of data visualization (with many beautiful or bizarre examples) include:

Inspiration and tools for additional types of visualizations can be found at (from specific to general):

2.8.2 Colors in R

The grDevices component of R comes with many options and tools for selecting and modifying colors:

  • Call colors() or demo("colors") in the Console to view the in-built colors of R.

See Appendix D for a primer on using colors in R and Section D.4 for corresponding resources and links.

2.8.3 Explanation: Same stats, different data

The introductory data examined above (in Section 2.1 and plotted in Figure 2.2) is known as Anscombe’s quartet (Anscombe, 1973) and is included in R as anscombe in the datasets package (R Core Team, 2020). It contains 4 sets of x-y coordinates which have the same statistical properties (regarding mean, SD, correlation, regression line), yet are actually quite different. (See ?anscombe for more information.)

Related web links include:


[02_visualize.Rmd updated on 2020-07-30 20:24:57 by hn.]


Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17–21. https://doi.org/10.2307/2682899

Bertin, J. (2011). Semiology of graphics: Diagrams, networks, maps (Vol. 1). ESRI Press.

Cairo, A. (2012). The functional art: An introduction to information graphics and visualization. Berkeley CA: New Riders.

Cairo, A. (2016). The truthful art: Data, charts, and maps for communication. Berkeley CA: New Riders.

Locke, S., & D’Agostino McGowan, L. (2018). datasauRus: Datasets from the datasaurus dozen. Retrieved from https://CRAN.R-project.org/package=datasauRus

Matejka, J., & Fitzmaurice, G. (2017). Same stats, different graphs: Generating datasets with varied appearance and identical statistics through simulated annealing. Proceedings of the 2017 CHI conference on human factors in computing systems, 1290–1294. https://doi.org/10.1145/3025453.3025912

R Core Team. (2020). R: A language and environment for statistical computing. Retrieved from https://www.R-project.org

Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Cheshire, CT: Graphics Press.

Tufte, E. R. (2006). Beautiful evidence (Vol. 1). Cheshire, CT: Graphics Press.

Tufte, E. R., Goeler, N. H., & Benson, R. (1990). Envisioning information (Vol. 126). Cheshire, CT: Graphics Press.

Yau, N. (2011). Visualize this: The FlowingData guide to design, visualization, and statistics. Hoboken, NJ: John Wiley & Sons.

Yau, N. (2013). Data points: Visualization that means something. Hoboken, NJ: John Wiley & Sons.