This book is a practical introduction to creating effective visualizations using ggplot2. There are many excellent resources for learning ggplot2, including the following:
- Hadley Wickham and Garrett Grolemund’s R for Data Science (R4DS)
- The ggplot2 website
- RStudio’s Data visualization with ggplot2 cheat sheet
We intend this book as a complement to these resources, building on what they teach about ggplot2, and we will link to them often.
R4DS, the website, and the cheat sheet mostly cover the mechanics of ggplot2. They teach you how to build plots in ggplot2, but it is generally discusses the practice of creating effective visualizations is generally outside their scope. There is also a wealth of resources devoted to teaching effective visualization techniques, which we call visualization wisdom. These resources include:
- William Cleveland’s books and papers, including The Elements of Graphing Data
- Edward Tufte’s books, including The Visual Display of Quantitative Information
- Claus Wilke’s Fundamentals of Data Visualization
Our goal is to combine ggplot2 mechanics and visualization wisdom into a single book. We hope readers come away with a solid grounding in ggplot2 and the ability to create effective visualizations for common situations. We can’t cover all types of visualizations, or all possible strategies for visualizing a given relationship. Instead, we hope to introduce readers to a way of thinking about data visualization that will help them approach and construct effective visualizations.
How to read this book
This book is organized thematically, with each chapter building upon the previous ones, and we recommend reading the chapters in order. For the most part, each chapter discusses a category of visualization (e.g., time series, distributions) and contains both ggplot2 mechanics and visualization wisdom. We’ve woven ggplot2 mechanics through the book so that readers quickly learn the most important skills, and then acquire more advanced skills as they become relevant to the visualization tasks at hand.
An advantage of the thematic approach is that readers just starting out with both ggplot2 and data visualization can simultaneously learn how to create plots in ggplot2 and how to make those plots effective. The thematic organization also enables readers to use the book as a visualization category reference. For example, if you know you want to visualize a distribution, you can go to the Distributions chapter and remind yourself of the various strategies.
A disadvantage of this organization is that the book does not function well as a reference for looking up ggplot2 mechanics (e.g., if you forget how scales work and try to refer back to the relevant section). The ggplot2 cheat sheet is a better resource for looking up mechanics, and we mention relevant sections so that readers can familiarize themselves with the cheat sheet for later use.
An evolving book
This book is not intended to be static. Starting in January 2019, we use this book to teach data visualization in the Stanford Data Challenge Lab (DCL) course. The DCL functions as a testing ground for educational materials, as our students give us routine feedback on what they read and do in the course. We use this feedback to constantly improve our materials, including this book. The source for the book is also available on GitHub where we welcome suggestions for improvements.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.