1 Introduction
“It is better to be looked over than overlooked.”
Graphics are important: they influence people’s opinions and they are memorable. Where do people learn how to interpret and draw graphics? There are many books, there is a lot on the web, there are many examples in the media. Unlike most of the books on graphics this one is about interpreting graphics rather than about drawing them. It is also more about exploratory graphics than presentation graphics.
Presentation graphics, those in published material, in reports, and, of course, in presentations, are for conveying known information, attracting, impressing, and influencing viewers. A single presentation graphic, particularly if it appears on television, may be seen by millions of viewers. These graphics should be carefully designed and reproduced and should convey their message clearly and crisply. Viewers expect to see something immediately in a presentation graphic.
Exploratory graphics are drawn to support investigations, find out information, and gain understanding. They require work, time and effort, to ensure the information in them is uncovered. They may only be seen by one person or by a small group of people. Exploratory graphics are rarely published. Books, articles, reports, software emphasize single presentation graphics. Very many exploratory graphics may be drawn to explore data and find out what information is there. There is strength in numbers of exploratory graphics.
Whereas presentation graphics may be closed, designed around one message (or a limited set of messages), exploratory graphics should be open and flexible, helping to identify details that may or may not be important. These could be minor features hinting at some local irregularities or issues of data quality, or they could be evidence of something substantial. Presentation graphics should smooth over unimportant variation and not distract with irrelevant details. Exploratory graphics should, at least initially, bring details to light, suggesting possible ideas to check. And checking is the key word. Ideas are generated by exploratory work, but many will be discarded after thorough checking. Perhaps the lack of a theory of graphics, the effort involved in examining graphics, and the feeling that graphics are a matter of common sense have all led to emphasising presentation graphics over exploratory graphics. Graphics are not just common sense: there is much more to them than that–even if we could agree on what common sense might be.
The approach taken in this work is to start with graphics in action, case studies illustrating what features can be seen and what they might imply. The data used are neither raw nor fully cleaned. Even after some correcting, their likely quality has to be kept in mind. Many graphics are drawn to investigate various aspects of the data and to understand them in context. Each single graphic is part of a larger analysis. There are no explicit exercises, but there is plenty to check and try for yourself, and there are many open questions.
The second part of the book discusses the principles underlying graphical analysis. A central theme is that it is too optimistic to think there might be a single ‘optimal’ display. It is better to look for a collection of displays that together reveal what information there is in the data. This book recommends considering many displays and many different displays, both variations of plots and distinct alternatives. Additional viewpoints provide additional insights. Nowadays it is easy to draw informative graphics quickly and to vary them flexibly. Powerful modern software supports the exploration of multiple graphics. There is no need to be restricted to a single, mythical, ‘optimal’ graphic.
The emphasis is on using several graphics displays together and is more on general principles than on specific graphics. It is not about new graphic forms, more about making the best use of known forms. It is about what features can be seen in graphics and how they might be interpreted, not so much about how to draw the graphics, more about what can be got out of them. A major principle is that any interpretations have to be checked, assessed, and evaluated. This is unlike p-hacking or data dredging. There is no stopping when possible results are found, all must be carefully examined and reviewed. There are many ways of checking ideas arising out of graphical analyses and all should be pursued (cf. §32.3). Jumping to conclusions is ill-advised, cautious scepticism is better.
The case studies use larger datasets than are commonly found in books on graphics, larger in terms of numbers of observations and numbers of variables. Consequently, analysis includes more data wrangling, cleaning, and reorganising data. It is misleading to talk of single datasets, as most analyses involve more than one. Sometimes several subsets or transformations of original datasets are analysed, and sometimes associated data are added, enlarging the original dataset. The datasets are all real, not invented or simulated but based on data collected in practice.
The case studies cover a range of topics and have been chosen to hopefully ensure that there is something of interest for everyone. There are two on politics (Chapters 4 and 26), several on sport (Chapters 6, 8, 15, 16, 20, and 21), three on birds (Chapters 14, 18, and 23), two on cars (Chapters 13 and 17), and there are others, including ones on facial recognition (Chapter 22), demographics (Chapter 2), and the movies (Chapter 3).
(Almost) all the datasets can be readily found in R, one of its packages or on the web, another reason for selecting them. Some of the datasets are made available in the R package GmooG that accompanies the book. Anyone interested in trying out the ideas discussed here for themselves or in experimenting with alternative graphics is encouraged to do so. Reading advice may be helpful, trying to apply it is definitely helpful.