# Interactive data visualization (IDV)

- Learning outcomes: Learn…
- …about the potential of interactive visualizations for data exploration and explanation.

Sources: Original material; Sievert (2020)

# 1 Readings

Readings in descending order of importance.

- Interactive visualizations with R - a minireview (January 2015!)
- Kirk (2016) and the accompanying webpage
- Wikipedia page on interactive data visualization
- Interactive web-based data visualization with R, plotly, and shiny

# 2 Some interactive visualizations

For each example think about the following questions: *What is the aim of this visualization? What are the interactive elements in this visualization, i.e. how can users interact with this visualization? What data lies behind this visualization?*

- Lobbyradar: Search for ‘Angela Merkel’ and ‘Krauss-Maffai Wegmann GmbH & Co. KG’ (was stopped…)
- Google Ngram Viewer
- Google Trends and an example (try Michael Jackson)
- PreisKaleidoskop
- D3 gallery (new link)and Force-Directed Graph and Matrix Diagram for character co-occurrences in Victor Hugo’s Les Miserables

Finally, as a preview for **Shiny**:

- Shiny Gallery
- My own early examples: one, two, etc.

# 3 Interactivity: Theory & Concepts

- “
**interactive**data visualization enables direct actions on a plot to change elements and link between multiple plots” (Swayne 1999) (Wikipedia) - Interactivity revolutionizes the way we work with data
- Revolutionizes perception of data (cf. Cleveland and McGill 1984)
- Started ~last quarter of the 20th century, PRIM-9 (1974) (Friendly 2006, 23, see also Cleveland and McGill, 1988, Young et al. 2006)
- We have come a long way… Prim9 (Tukey inventor of boxplot)
- More history
- Interactivity allows for…
- …making sense of big data (more dimensions)
- …exploring data
- …making data accessible to those without stats background (dashboards!)
- Online publishing, Interactive analysis/reading
- Past projects: www.digitalpolitics.info

- Example: Check out the datablogs of various newspapers… data journalism!

# 4 Elements of a graph

- Plot combines…
- …
**data** - …the
**scales/coordinate system**, which generates axes and legends so that we can read values from the graph - …plot
**annotations**, such as the background and plot title (Wickham 2010, 5)

- …
*Which of these three elements can we modify in an interactive graph?*

# 5 Elements & Interactivity

- Any of theses elements can be subject to interactivity
- Take a subset of the
**data** - Zoom in on
**scales**

- Take a subset of the
- Conceptual differentiation sometimes unclear, e.g., painting = subsetting in Google Ngram Viewer (hoover names)?
- Best to differentiate according to the elements that are manipulated!
**Data**or**scales**or**annotations**

# 6 Some concepts of interaction

Some fundamental concepts (adapted from Wikipedia)…

**Brushing**: Paintbrushing (pointer, rectangle, lasso) data, directly changing color or glyph of elements of plot**Painting**(= persistent brushing): Group points into clusters and proceed to other operations, such as to comparing groups**Identification**(labeling, label brushing) : Bringing cursor near a point or edge in a scatterplot etc., causes a label to appear that identifies the plot element (also called**mouseover**or**hoover**)

**Scaling**: Scales map data onto window; Zooming in; Change aspect ratio

**Linking**: Connects elements selected in one plot with elements in another plot (e.g., Fig. 16.4)**One-to-one**: Both plots show different projections of same data; Point in one plot corresponds to exactly one point in the other**Area plots**: Brushing any part of an area has the same effect as brushing it all and is equivalent to selecting all cases in the corresponding category

## 6.1 Example/Exercise (1)

*What kind of interactive operations do the plots below allow you to do? On what graph element do they operate? What would you do to give the impression of a stronger relationship?*

## 6.2 Example/Exercise (2)

*What kind of interactive operations do the plots below allow you to do? On what graph element do they operate? What is the different between the two plots below?*(with interactivity 3d become powerful!)

# 7 Data

`?swiss`

: Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888 (see here and Mosteller and Tukey 1977) (cognitive load)- Let’s take a step back…

Fertility | Agriculture | Examination | Examination_cat |
---|---|---|---|

80.2 | 17.0 | 15 | lower |

83.1 | 45.1 | 6 | lowest |

92.5 | 39.7 | 5 | lowest |

85.8 | 36.5 | 12 | lowest |

76.9 | 43.5 | 17 | higher |

- Dataset is a matrix with
**columns = variables**and**rows = units** - Data consist of
**numbers (e.g., Fertility)**and**letters (e.g., Examination_cat)**

# 8 Data: Subsetting

*What subsets (filtering) can we choose of the data below?**How can we select those subsets?*

Fertility | Agriculture | Examination | Examination_cat |
---|---|---|---|

80.2 | 17.0 | 15 | lower |

83.1 | 45.1 | 6 | lowest |

92.5 | 39.7 | 5 | lowest |

# 9 Data: Manipulation/Creation

- We can also
**manipulate existing**or**create/add/simulate**new data - Applying
**mathematical functions**to existing of data- Multiply by 10 and add; Recode to and add dummy; Add mean; Calculate and add correlation

Examination_cat | Fertility | Agriculture | F.10 | F.d | F.mean | F.A.corr |
---|---|---|---|---|---|---|

lower | 80.2 | 17.0 | 802 | 0 | 85.26667 | 0.5387801 |

lowest | 83.1 | 45.1 | 831 | 1 | 85.26667 | 0.5387801 |

lowest | 92.5 | 39.7 | 925 | 1 | 85.26667 | 0.5387801 |

- New data either appended to matrix (column or row) or stored in new object
**Statistical models**(regression): Mathematical functions applied to portions of data that are reduced to fewer estimates- Interactivity often involves
**manipulation, creation or reduction of underlying data**

# 10 Tools

- Scientists (public servants) should work with
**open-source software**! - R: Free, community, powerful, online-documentation/help, pioneers, interdisciplinarity, object-oriented, popularity, workflow (empirical reports)
- RStudio IDE (integrated development environment): Productive user interface for R (powerful, free, open source, works on various systems)
- Shiny
- A web application framework for R to turn analyses into interactive web applications
- Attractive because…
- …no need to really learn html, css or javascript (htmlwidgets)
- …aims at data analysts who are not programmers
- …easy upload from RStudio
- …it is developing so fast

- Shinyapps.io
- A platform/server running R as a service for hosting Shiny web apps (free account with 5 apps, 25 active hours)

- Plotly (see Github repository)
- “Built on top of d3.js and stack.gl, plotly.js is a high-level, declarative charting library. plotly.js ships with 20 chart types, including 3D charts, statistical graphs, and SVG maps.”
- Why? Open-source, high-level, fast etc. (Who?)

- plotly R package
- “Plotly for R is an interactive, browser-based charting library built on the
**open source javascript graphing library, plotly.js**. It works entirely locally, through the HTML widgets framework” (Who?)

- “Plotly for R is an interactive, browser-based charting library built on the
- Other tools we skip htmlwidgets, D3
**YOUR FRIENDS**: http://stackexchange.com/ and http://stackoverflow.com/ & LLMs such as ChatGPT

## References

Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.”

*Journal of the American Statistical Association*79 (387): 531–54.
Friendly, Michael. 2006. “A Brief History of Data Visualization.” In

*Handbook of Data Visualization*, 15–56. Springer Handbooks Comp.statistics. Springer Berlin Heidelberg.
Kirk, Andy. 2016.

*Data Visualisation: A Handbook for Data Driven Design*. SAGE.
Mosteller, Frederick, and John Wilder Tukey. 1977. “Data Analysis and Regression: A Second Course in Statistics.”

*Addison-Wesley Series in Behavioral Science: Quantitative Methods*.
Sievert, Carson. 2020.

*Interactive Web-Based Data Visualization with r, Plotly, and Shiny*. 1 edition. Chapman; Hall/CRC.
Swayne, Deborah. 1999. “Introduction to the Special Issue on Interactive Graphical Data Analysis: What Is Interaction?”

*Computational Statistics*14 (1): 1–6.
Wickham, Hadley. 2010. “A Layered Grammar of Graphics.”

*Journal of Computational and Graphical Statistics: A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America*19 (1): 3–28.