This section contains links to related resources.
2.8.1 Help on visualization
The following links provide additional information on using ggplot2:
vignette("ggplot")and the documentation for
ggplotand various geoms (e.g.,
study https://ggplot2.tidyverse.org/reference/ and its examples;
study the RStudio cheatsheet on data visualization:
Books or scripts on data visualization include:
More recent publications that are geared to the needs of aspiring data scientists include:
Data Visualization. A practical introduction (by Kieran Healy) is beautiful, informative, and elegant.
Fundamentals of Data Visualization (by Claus O. Wilke) provides many instructive examples and helps distinguishing good from bad and ugly graphs.
R Graphics Cookbook (by Winston Chang) provides hands-on advice on using ggplot2 and many useful recipes for data transformation.
Data Visualization with R (by Rob Kabacoff) relies heavily on the ggplot2 package, but also covers other approaches.
More specific resources on the principles of data visualization (with many beautiful or bizarre examples) include:
Data visualization principles (by Rafael A. Irizarry)
Data visualization: Basic principles (by Peter Aldhous)
Inspiration and tools for additional types of visualizations can be found at (from specific to general):
2.8.2 Colors in R
The grDevices component of R comes with many options and tools for selecting and modifying colors:
demo("colors")in the Console to view the in-built colors of R.
2.8.3 Explanation: Same stats, different data
The introductory data examined above (in Section 2.1 and plotted in Figure 2.2) is known as Anscombe’s quartet (Anscombe, 1973) and is included in R as
anscombe in the datasets package (R Core Team, 2020). It contains 4 sets of x-y coordinates which have the same statistical properties (regarding mean, SD, correlation, regression line), yet are actually quite different. (See
?anscombe for more information.)
Related web links include:
A recent CHI paper (Matejka & Fitzmaurice, 2017) is available at https://www.autodeskresearch.com/publications/samestats
Blog posts by Alberto Cairo http://www.thefunctionalart.com/2016/08/download-datasaurus-never-trust-summary.html and David Smith https://blog.revolutionanalytics.com/2017/05/the-datasaurus-dozen.html
[02_visualize.Rmd updated on 2020-10-22 16:50:43 by hn.]
Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17–21. https://doi.org/10.2307/2682899
Bertin, J. (2011). Semiology of graphics: Diagrams, networks, maps (Vol. 1). ESRI Press.
Cairo, A. (2012). The functional art: An introduction to information graphics and visualization. Berkeley CA: New Riders.
Cairo, A. (2016). The truthful art: Data, charts, and maps for communication. Berkeley CA: New Riders.
Locke, S., & D’Agostino McGowan, L. (2018). datasauRus: Datasets from the datasaurus dozen. Retrieved from https://CRAN.R-project.org/package=datasauRus
Matejka, J., & Fitzmaurice, G. (2017). Same stats, different graphs: Generating datasets with varied appearance and identical statistics through simulated annealing. Proceedings of the 2017 CHI conference on human factors in computing systems, 1290–1294. https://doi.org/10.1145/3025453.3025912
R Core Team. (2020). R base: A language and environment for statistical computing. Retrieved from https://www.R-project.org
Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Cheshire, CT: Graphics Press.
Tufte, E. R. (2006). Beautiful evidence (Vol. 1). Cheshire, CT: Graphics Press.
Tufte, E. R., Goeler, N. H., & Benson, R. (1990). Envisioning information (Vol. 126). Cheshire, CT: Graphics Press.
Yau, N. (2011). Visualize this: The FlowingData guide to design, visualization, and statistics. Hoboken, NJ: John Wiley & Sons.
Yau, N. (2013). Data points: Visualization that means something. Hoboken, NJ: John Wiley & Sons.