Introduction

In an experiment conducted by researchers from University College London (Mcmanus and Gesiak 2014), 277 participants were asked to look at several pairs of paintings: one of the pairs was an original by abstract painter Piet Mondrian, and the other was fake version that closely resembled it.1 The participants where asked:

When looking at the pictures you should decide overall which you thinks looks better, in that it looks nicer, it looks better organised, or it looks better balanced.

The results suggested that people could identify the originals with some degree of accuracy (\(\mu\) 54.7%, SE .40). In other words, reliably better than chance! The experiment aimed to compare two methods in Empirical Aesthetics: the method of choice and the method of production. This choice experiment “implies people know something about what makes a real Mondrian.” In other words, we have an intuition of proportion and beauty.

An *attempt* at a Mondrian using `ggplot2`

Figure 0.1: An attempt at a Mondrian using ggplot2

Why Visualize?

Data visualization plays numerous roles in the social sciences, from summarizing large amounts of information in a small space, to supporting claims about patterns and relationships among a vast array of indicators of human behavior.

Zinovyev (Zinovyev 2010) identifies four types of visualizations in political science:

  • Statistical graphics and infographics with extensive use of color, form, size, shape and style to superimpose many quantitative variables in the same chart or diagram
  • Geographical information systems (GIS) to visualize geographically-linked data
  • Graph visualization or network maps for representing relations between objects
  • Projection of multidimensional data on low-dimensional screens with further visualization, data cartography

Throughout this book, we will cover examples of from all of these types of visuals.

About this Book

If you are in the Data Visualization course at IBEI, you will need it. If you are not, it might be useful anyway!

To keep things as simple as possible, the book follows this syntax:

  • packages are placed inside a shaded box (e.g. ggplot2),
  • common functions() are also inside a shaded box, and followed by parentheses (e.g. mutate() from tidyverse or class() from base R),
  • less common functions() are the same, but the package is explicitly called :: (e.g. reshape2::melt()),
  • short R commands (e.g. %in%), are also shaded, non-R commands are in bold (e.g. Ctrl + p),
  • the common pipe operator %>% will be used when possible in the code2 (i.e., we will mostly use the tidyverse syntax over base R).

Prerequisites

You are not expected to have any familiarity with R at the beginning of the course, though some knowledge of statistics will be very helpful. We will cover the basics of working with R and RStudio during the first few sessions. Some tutorial videos on the basics of working with RStudio are available here. Similarly, you may also consult the following open-source books on R:3

Acknowledgements

I would like to thank the 2020-21 IBEI MA cohort, as their suffering throughout the first iteration of this course was instrumental in the development of this e-book.

Contact

This book is in open review. If you have any questions, comments or suggestions; please contact me by email.

References

Chang, Winston. 2018. R Graphics Cookbook: Practical Recipes for Visualizing Data. Second. Sebastopol, California: O’Reilly Media. https://r-graphics.org/.

Gillespie, Colin, and Robin Lovelace. 2016. Efficient R Programming: A Practical Guide to Smarter Programming. Sebastopol, California: O’Reilly Media. https://csgillespie.github.io/efficientR/.

Grolemund, Garrett. 2014. Hands-on Programming with R: Write Your Own Functions and Simulations. Sebastopol, California: O’Reilly Media. https://rstudio-education.github.io/hopr/.

Grolemund, Garrett, and Hadley Wickham. 2016. R for Data Science. Sebastopol, California: O’Reilly Media. https://r4ds.had.co.nz/.

Long, JD, and Paul Teetor. 2019. R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics. Second. Sebastopol, California: O’Reilly Media. https://rc2e.com/.

Mcmanus, Ian, and Paul Gesiak. 2014. “Experimenting with Mondrian: Comparing the Method of Production with the Method of Choice.” In. https://doi.org/10.13140/2.1.1561.2967.

Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. Sebastopol, California: O’Reilly Media. https://www.tidytextmining.com/.

Venables, W. N., D. M. Smith, and the R Core Team. 2021. An Introduction to R. https://cran.r-project.org/doc/manuals/R-intro.pdf.

Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. Sebastopol, California: O’Reilly Media. https://clauswilke.com/dataviz/.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.

Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.

Zinovyev. 2010. “Data Visualization in Political and Social Sciences.” In. https://arxiv.org/abs/1008.1188.


  1. The pseudo-Mondrians were created by jittering all the lines in the original but keeping the same relative positions.↩︎

  2. For Windows users, the %>% shortcut in RStudio is Ctrl + Shift + M and for Mac users it is Cmd + Shift + M.↩︎

  3. For a comprehensive list of R-related books, consult the R-Project Website↩︎