Introduction

Sources: Original material; Healy (2018); Tufte (2001);

1 What is data visualization?

  • Infographics are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly
  • Data visualization: “is a collection of methods that use visual representations to explore, make sense of, and communicate quantitative data.” (Stephen Few, Blog, Books)



  • More definitions…
    • Statistical graph (Definition) (wow!)
    • “Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of the images. This communication is achieved through the use of a systematic mapping between graphic marks and data values in the creation of the visualization. This mapping establishes how data values will be represented visually, determining how and to what extent a property of a graphic mark, such as size or color, will change to reflect changes in the value of a datum.” (Wikipedia)
    • main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information” (Friedman)

2 Why look?

2.1 Anscombes’s quartet (1)

  • Table 1 displays Anscombe’s quartet (Anscombe 1973), a dataset (or 4 little datasets) often used to illustrate the usefulness of visualization (xs and ys have same means)
    • Q: What does the table reveal about the data? Is it easy to read?
Table 1: Anscombe’s quartett: Visualization
Anscombe's quartet data
x1 y1 x2 y2 x3 y3 x4 y4
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.39 19 12.50
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89

2.2 Anscombes’s quartet (2)

  • Table 2 shows results from a linear regression based on Anscombe’s quartet (Anscombe 1973)
    • Q: What do we see now?
Table 2: Linear models based on sets of Anscombe’s quartet
y1 (Dataset 1) y2 (Dataset 2) y3 (Dataset 3) y4 (Dataset 4)
(Intercept) 3.000 3.001 3.002 3.002
(1.125) (1.125) (1.124) (1.124)
x1 0.500
(0.118)
x2 0.500
(0.118)
x3 0.500
(0.118)
x4 0.500
(0.118)
Notes: some notes...

2.3 Anscombes’s quartet (3)

  • Figure 1 finally visualizes the data underlying those data
    • Q: What do we see here? What is the insight?
Figure 1: Anscombe’s quartet: Visualization

2.4 The Datasaurus Dozen

Figure 2: The Datasaurus Dozen animated by Tom Westlake

2.5 Voter turnout and income inequality

Source (Healy 2018, Ch 1, Figure 1.2: Seeing the effect of an outlier on a regression line.): https://socviz.co/lookatdata.html

3 Exercise: The “best” graph ever drawn

  • Use Strg + “+”(mouswheel) to zoom into Figure 3 below (either english or french version) and please discuss the following questions in groups
    1. What is shown on the graph? What story does it tell us?
    2. How much data (dimensions/variables) are visualized? How are the encoded?
    3. What do you like about the graph, what don’t you like? (Source: Wikipedia)
Figure 3: Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard
Figure 4: Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

4 First lessons on what is a good graph…

  • Tufte: Charles Joseph Minard’s graph (check out the other graphs)
    • “may well be the best statistical graphic ever drawn
    • “tells a rich, coherent story with its multivariate data […]. Six variables are plotted: the size of the army, its location (longitude/latitude) on a two-dimensional surface, direction of the army’s movement, and temperature on various dates during the retreat from Moscow” (Tufte 2001, 2:40)
  • “Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design….[It] consists of complex ideas communicated with clarity, precision, and efficiency….[It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space….[It] is nearly always multivariate….And graphical excellence re-quires telling the truth about the data” (Tufte 2001, 2:51; as cited in Healy and Moody 2014, 109)
  • Pragmatism: Healy and Moody (2014, 109): Tour de force such as Minard’s “can be […] admired, but there are no compositional principles on how to create that one wonderful graphic in a million.” (Tufte 2001, 2:177)
    • The best one can do for “more routine, work a day designs” is to suggest some guidelines such as “have a properly chosen format and design,” “use words, numbers, and drawing together,” “display an accessible complexity of detail,” and “avoid content-free decoration, including chartjunk(Tufte 2001, 2:177)
  • Re-visions of Minard

5 Minard in ggplot

?@fig-Minard-ggplot-two and the tables below illustrate that the underlying data is spread across several datasets (and temperature is not show in the graph!)

Source: https://en.wikipedia.org/wiki/Charles_Joseph_Minard * See folder “data/Minard” and subsets of the data below.

long lat survivors direction group
24.0 54.9 340000 A 1
24.5 55.0 340000 A 1
25.5 54.5 340000 A 1
26.0 54.7 320000 A 1
27.0 54.8 300000 A 1
28.0 54.9 280000 A 1
long lat city
24.0 55.0 Kowno
25.3 54.7 Wilna
26.4 54.4 Smorgoni
26.8 54.3 Moiodexno
27.7 55.2 Gloubokoe
27.6 53.9 Minsk
long temp month day date
37.6 0 Oct 18 NA
36.0 0 Oct 24 NA
33.2 -9 Nov 9 1812-11-09
32.0 -21 Nov 14 1812-11-14
29.2 -11 Nov 24 1812-11-24
28.5 -20 Nov 28 1812-11-28

References

Anscombe, F J. 1973. “Graphs in Statistical Analysis.” Am. Stat. 27 (1): 17–21.
Healy, Kieran. 2018. Data Visualization: A Practical Introduction. Princeton University Press.
Healy, Kieran, and James Moody. 2014. “Data Visualization in Sociology.” Annu. Rev. Sociol. 40 (July): 105–28.
Tufte, Edward R. 2001. The Visual Display of Quantitative Information. Vol. 2. Graphics press Cheshire, CT.