Interactive data visualization (IDV)

Sources: Original material; Sievert (2020)

1 Readings

Readings in descending order of importance.

2 Some interactive visualizations

For each example think about the following questions: What is the aim of this visualization? What are the interactive elements in this visualization, i.e. how can users interact with this visualization? What data lies behind this visualization?

Finally, as a preview for Shiny:

3 Interactivity: Theory & Concepts

  • interactive data visualization enables direct actions on a plot to change elements and link between multiple plots” (Swayne 1999) (Wikipedia)
  • Interactivity revolutionizes the way we work with data
  • Revolutionizes perception of data (cf. Cleveland and McGill 1984)
  • Started ~last quarter of the 20th century, PRIM-9 (1974) (Friendly 2006, 23, see also Cleveland and McGill, 1988, Young et al. 2006)
  • We have come a long way… Prim9 (Tukey inventor of boxplot)
  • More history
  • Interactivity allows for…
    • …making sense of big data (more dimensions)
    • …exploring data
    • …making data accessible to those without stats background (dashboards!)
    • Online publishing, Interactive analysis/reading
    • Past projects: www.digitalpolitics.info
  • Example: Check out the datablogs of various newspapers… data journalism!

4 Elements of a graph

  • Plot combines…
    • data
    • …the scales/coordinate system, which generates axes and legends so that we can read values from the graph
    • …plot annotations, such as the background and plot title (Wickham 2010, 5)
  • Which of these three elements can we modify in an interactive graph?

5 Elements & Interactivity

  • Any of theses elements can be subject to interactivity
    • Take a subset of the data
    • Zoom in on scales
  • Conceptual differentiation sometimes unclear, e.g., painting = subsetting in Google Ngram Viewer (hoover names)?
  • Best to differentiate according to the elements that are manipulated!
    • Data or scales or annotations

6 Some concepts of interaction

Some fundamental concepts (adapted from Wikipedia)…

  • Brushing : Paintbrushing (pointer, rectangle, lasso) data, directly changing color or glyph of elements of plot
  • Painting (= persistent brushing): Group points into clusters and proceed to other operations, such as to comparing groups
  • Identification (labeling, label brushing) : Bringing cursor near a point or edge in a scatterplot etc., causes a label to appear that identifies the plot element (also called mouseover or hoover)
  • Scaling: Scales map data onto window; Zooming in; Change aspect ratio
  • Linking: Connects elements selected in one plot with elements in another plot (e.g., Fig. 16.4)
    • One-to-one: Both plots show different projections of same data; Point in one plot corresponds to exactly one point in the other
    • Area plots: Brushing any part of an area has the same effect as brushing it all and is equivalent to selecting all cases in the corresponding category

6.1 Example/Exercise (1)

  • What kind of interactive operations do the plots below allow you to do? On what graph element do they operate? What would you do to give the impression of a stronger relationship?

6.2 Example/Exercise (2)

  • What kind of interactive operations do the plots below allow you to do? On what graph element do they operate? What is the different between the two plots below? (with interactivity 3d become powerful!)

7 Data

  • ?swiss: Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888 (see here and Mosteller and Tukey 1977) (cognitive load)
  • Let’s take a step back…
Fertility Agriculture Examination Examination_cat
80.2 17.0 15 lower
83.1 45.1 6 lowest
92.5 39.7 5 lowest
85.8 36.5 12 lowest
76.9 43.5 17 higher
  • Dataset is a matrix with columns = variables and rows = units
  • Data consist of numbers (e.g., Fertility) and letters (e.g., Examination_cat)

8 Data: Subsetting

  • What subsets (filtering) can we choose of the data below?
  • How can we select those subsets?
Fertility Agriculture Examination Examination_cat
80.2 17.0 15 lower
83.1 45.1 6 lowest
92.5 39.7 5 lowest

9 Data: Manipulation/Creation

  • We can also manipulate existing or create/add/simulate new data
  • Applying mathematical functions to existing of data
    • Multiply by 10 and add; Recode to and add dummy; Add mean; Calculate and add correlation
Examination_cat Fertility Agriculture F.10 F.d F.mean F.A.corr
lower 80.2 17.0 802 0 85.26667 0.5387801
lowest 83.1 45.1 831 1 85.26667 0.5387801
lowest 92.5 39.7 925 1 85.26667 0.5387801
  • New data either appended to matrix (column or row) or stored in new object
  • Statistical models (regression): Mathematical functions applied to portions of data that are reduced to fewer estimates
  • Interactivity often involves manipulation, creation or reduction of underlying data

10 Tools

  • Scientists (public servants) should work with open-source software!
  • R: Free, community, powerful, online-documentation/help, pioneers, interdisciplinarity, object-oriented, popularity, workflow (empirical reports)
  • RStudio IDE (integrated development environment): Productive user interface for R (powerful, free, open source, works on various systems)
  • Shiny
    • A web application framework for R to turn analyses into interactive web applications
    • Attractive because…
      • …no need to really learn html, css or javascript (htmlwidgets)
      • …aims at data analysts who are not programmers
      • …easy upload from RStudio
      • …it is developing so fast
  • Shinyapps.io
    • A platform/server running R as a service for hosting Shiny web apps (free account with 5 apps, 25 active hours)
  • Plotly (see Github repository)
    • “Built on top of d3.js and stack.gl, plotly.js is a high-level, declarative charting library. plotly.js ships with 20 chart types, including 3D charts, statistical graphs, and SVG maps.”
    • Why? Open-source, high-level, fast etc. (Who?)
  • plotly R package
    • “Plotly for R is an interactive, browser-based charting library built on the open source javascript graphing library, plotly.js. It works entirely locally, through the HTML widgets framework” (Who?)
  • Other tools we skip htmlwidgets, D3
  • YOUR FRIENDS: http://stackexchange.com/ and http://stackoverflow.com/ & LLMs such as ChatGPT

References

Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54.
Friendly, Michael. 2006. “A Brief History of Data Visualization.” In Handbook of Data Visualization, 15–56. Springer Handbooks Comp.statistics. Springer Berlin Heidelberg.
Kirk, Andy. 2016. Data Visualisation: A Handbook for Data Driven Design. SAGE.
Mosteller, Frederick, and John Wilder Tukey. 1977. “Data Analysis and Regression: A Second Course in Statistics.” Addison-Wesley Series in Behavioral Science: Quantitative Methods.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. 1 edition. Chapman; Hall/CRC.
Swayne, Deborah. 1999. “Introduction to the Special Issue on Interactive Graphical Data Analysis: What Is Interaction?” Computational Statistics 14 (1): 1–6.
Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics: A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America 19 (1): 3–28.