1 Overview
The R programming language provides researchers with access to a large range of fully customisable data visualisation options, which are typically not available in point-and-click software. These visualisations are not only visually appealing, but can increase transparency about the distribution of the underlying data, rather than relying on commonly used visualisations of aggregations.
In this introductory section of our course, we will provide a practical introduction to using R, particularly in how to visualise data which you will use throughout the course. First, we will explain the rationale behind using R for data visualisation using the ggplot2
package. This package will allow us to begin with common plotting outputs such as histograms and boxplots, and extend to more complex structures used within spatial data visualisation.
1.1 The ggplot2
package
There are a host of options to data visualisation in R. In this course, we will mainly use the ggplot2
package, which forms part of the larger tidyverse
collection of packages which provide functions for efficient data management in R. We will also use eother packages within tidyverse
in the course.
A grammar of graphics is a standardised way to describe the components of a graphic. ggplot2
uses a layered grammar of graphics, in which plots are bulit up in a series of layers. It may be helpful to think about any picture as having multiple elements that sit semi-transparently over each other.
Figure \(\ref{fig:layer}\) shows the evolution of a sumple scatterplot using this layered approach. First, the plot space is built (layer 1); the variables are specified (layer 2); the type of visualisation that is desired for these variables is specified (layer 3) - in this case geom_point()
is called to visualise individual data points; a second geom
layer is added to include a line of best fit (layer 4); the axis labels are editied for readability (layer 5) and finally, a theme is applied to change the overall appearance of the plot (layer 6).
Each layer is independent and individually customisable. For example, the size, colour and position of each component can be adjusted. The use of layers makes it easy to build up complex plots step-by-step, and to adapt or extend plots from existing code.