Chapter 6 Graphing Principles

Some visual tasks are easier than others because our visual processing evolved to do certain tasks. Our visual cortex is well developed to compare heights and spatial relationships. We do well comparing 3 or 4 colors, but are terrible distinguishing 8 or more colors. We are also bad at comparing areas and volumes as well as comparing heights when there isn’t a common reference point.

6.1 Elementary Perception Tasks

We can break basic graph reading tasks into a hierarchy of tasks that range between very quick and precise to extremely difficult. These EPTs will be how we encode continuous variables. For example, using the length of a bar or area of a region.

From Alberto Cairo’s “The Truthful Art”

Ideally we will create a graph that utilizes tasks that are easier compared to tasks that are harder.

Which graph is Type is larger? Is it Type B perhaps? Between the other two, which is larger? This is not easy to tell, but in the following graph, the answer is obvious because we can compare the heights of the bars relative to a common axis.

6.2 Groupings / Gestalt

To encode categorical variables into the group, we can utilize a number of grouping techniques. Appropriate choice of grouping structure allows us to form a hierarchy of groupings that encourage the reader to see trends within and between groups.

The way we form groupings can be in one of the following ways, (where higher grouping methods produce a stronger grouping effect.

  1. Enclosures
  2. Connections
  3. Proximity (often this is the sort order on the x or y axis)
  4. Similarity (color is better than shape)

The idea is that within an enclosure we can easily compare points, but between enclosures we will consider some aggregate aspect of the graph and compare that aspect between graphs in other enclosures. Therefore if you want to compare to points, dots, (generically marks) you want them in the same enclosure.

6.2.1 Grouping Effects

We stated that some grouping effects were stronger than others. We’ll now demonstrate that fact.

In the following example, find all the fives:

If instead, we form a grouping by adding color, then the group of fives stands out prominently. This example was inspired by an example by Alberto Cairo.

This makes it clear that color is a stronger grouping mechanism than shape.

6.2.2 Grouping Examples

In the following examples, ask yourself what the visual grouping the graph is encouraging? Are the rows or columns grouped? These are taken from a presentation by Todd Iverson and Silas Bergen.

Notice in this example, we have two levels of grouping.

Some grouping is stronger than others

6.2.3 Example: Warpbreaks

While spinning wool into thread, if the tension on the wool isn’t correctly set, the thread can break. Here we compare two different types of wool at three different tensions.

6.2.4 Example - Federal Spending over Time

In the following figure, we’ve graphed the amount spent by the US government on four different programs over four years (2012-2015).

For these two graphs, the exact same data is used.

During the creation of these graphs, we might have been primarily interested in different facts. For each fact, indicate which graph is better and explain why using grouping and EPT principles.

a)  The amount of money spent on Social Security has increased over
    time (2012-2015).  
    
    *For this fact, graph B, with Social Security in its own enclosure,*
    *is best because we want to compare the bars for 2012-2015 and the*
    *closer those bars are to each other the easier the comparison is.*
    
b)  The amount spent on Social Security has increased over time
    (2012-2015) relative to other major spending categories.
    
    *Here the comparison is how much larger the Social Security*
    *bar is relative to the others each year. So we want to notice the*
    *difference in 2012 and then compare that difference to the other*
    *years. This is exactly the use case for enclosures so graph A*
    *is much better to demonstrate this fact.*

6.3 “Color” Scales

Defining Color really has three different attributes (From Wikipedia).

6.3.0.1 HSV Scale

  • Hue: The attribute of a visual sensation according to which an area appears to be similar to one of the perceived colors: red, yellow, green, and blue, or to a combination of two of them.
  • Saturation: The “colorfulness of a stimulus relative to its own brightness”
  • Value: The “brightness relative to the brightness of a similarly illuminated white”

HSV Cylinder from Wikipedia

  • Hue is appropriate for categorical variables.
  • Saturation and/or Value is appropriate for a quantitative variable scale.

Neither R nor Tableau make it particularly easy to map these aspects, so we won’t get too deep into it.

6.4 Examples

6.4.1 RobinHood App

The brokerage firm RobinHood is a low cost stock trading platform that purports to make the stock market accessible to everyone. However in the middle of the GameStop short-squeeze, it blocked users from buying GameStop shares and the users retaliated by leaving negative reviews on the Apple App store.

What EPTs are being used? What Grouping structure is being used?

There are actually two EPTs being used. First, as almost always, we have two axes (so common axis comparisons) which encodes time (along the x-axis) and the number of reviews. So it is very easy to see the January 7th peak positive number of reviews. We also have that same information encoded using the length of the bar. Bar charts are easy to read because the same information is encoded in the graph two different ways. There are two types of grouping going on. First, the color grouping is encoding if the review was positive or negative. Second, there is also a day grouping that is encoded by the space between the bars.

6.4.2 Coffee Varieties & Origins

In this example, we have information about the country of origin and coffee types. This is an interactive graph and reader can select a country or variety and see the corresponding item.

What EPTs are being used? What Grouping structure is being used?

While it isn’t stated anywhere in the graph, I think the length of the bars is related to the amount of coffee exported. I initially thought that perhaps the color of the country was related to the county continent or region, but on deeper inspection I think the color is just randomly assigned to country. The color does basic grouping for the lines traversing from a coffee type to the country.

6.4.3 Trade with Britain

In this scatter plot example, the x and y-axes are a country’s GDP growth versus the its growth in trade with Britain. So data points above the 1-to-1 line have increased their economic ties with Britain and points below have reduced their ties.

What EPTs are being used? What Grouping structure is being used?

In this example the primary relationship being explored is the change in economic ties with Britain. This is encoded using the common scales EPT. The overall size of the country’s trade with Britain is encoded using the point area. It is hard to detect fine differences (for example it is unclear if Poland, Turkey, or India has more trade with Britain. There isn’t any grouping structure in this graph.

6.5 Exercises

  1. In soccer, Kylian Mbappé is being referred to as the next world dominating player. The graphic below compares Mbappé to soccer greats Cristiano Ronaldo and Lionel Messi. Below is a graph showing the number of goals scored during various parts of their career (first 100 professional games, second 100, etc).

    Mbappé vs Messi and Ronaldo

    1. Identify the EPT the reader is required to perform and the grouping structures used. Notice the grouping structure is nested. This forces the user first look at the trends within a top level group and then ask if the trend remains the same across the different groups.
    2. Comment on how the grouping was effective or if you think some change might be better.
  2. In NBA basketball, shot accuracy and distance from the net is important. In the following graph, we can investigate how often shots are taken from different distances. In the first graph, the y-axis is the cumulative percentage of all shots taken. So a steep slope means that lots of shots are taken at that distance and a shallow slope means very few shots are taken.

    Shot distribution in the NBA

    1. Identify the EPT the reader is required to perform in the top and bottom graph. Comment on which graph you understood more easily and why.
    2. Comment on how the grouping of distance and which was more effective and informative.
  3. The following graphic calculates difference between a month’s temperature compared to the monthly average temperature across the years since 1850. For example, this graph compares the average January 2014 temperature to the average January temperature across the last 170 Januaries.)

    Average World Temperature Since 1850

    1. Identify the EPT the reader is required to perform and the grouping structures used.
    2. Comment on how the grouping was effective or if you think some change might be better.
  4. Find a graphic “in the wild” that you think is interesting. Turn in both the original graphic as well as your answers to the following:

    1. Identify the EPT the reader is required to perform and the grouping structures used.
    2. Comment on how the graph was effective or if you think some change might improve the graph.