5.6 Plotting with ggplot2
ggplot2
is a plotting package that makes it simple to create complex plots from data stored in a data frame. It provides a programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatterplot. This helps in creating publication quality plots with minimal amounts of adjustments and tweaking.
First, let’s load ggplot2.
library(ggplot2)
ggplot2
functions work best with data in the ‘long’ format, i.e., a column for every dimension, and a row for every observation. Well-structured data will save you lots of time when making figures with ggplot2
ggplot2
graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.
Each chart built with ggplot2
must include the following
Data
Aesthetic mapping (aes)
- Describes how variables are mapped onto graphical attributes
- Visual attribute of data including x-y axes, color, fill, shape, and alpha
- Describes how variables are mapped onto graphical attributes
Geometric objects (geom)
- Determines how values are rendered graphically, as bars (geom_bar), scatterplot (geom_point), line (geom_line), etc.
Thus, the template for graphic in ggplot2 is:
<DATA> %>%
ggplot(aes(<MAPPINGS>)) +
<GEOM_FUNCTION>()
Remember from the last lesson that the pipe operator %>%
places the result of the previous line(s) into the first argument of the function. ggplot()
is a function that expects a data frame to be the first argument. This allows for us to change from specifying the data =
argument within the ggplot
function and instead pipe the data into the function.
- use the
ggplot()
function and bind the plot to a specific data frame.
%>% ggplot() yearly_count
Is the same as
ggplot(data=yearly_count)
- define a mapping (using the aesthetic (
aes
) function), by selecting the variables to be plotted and specifying how to present them in the graph, e.g. as x/y positions or characteristics such as size, shape, color, etc.
%>%
yearly_count ggplot(aes(x=Year, y=TotalCount))
This can also be written more concisely:
%>%
yearly_count ggplot(aes(Year, TotalCount))
- add ‘geoms’ – graphical representations of the data in the plot (points, lines, bars).
ggplot2
offers many different geoms for common graph types. To add a geom to the plot use the+
operator. Note to that you can save plots as objects.
<-
year_total_line %>%
yearly_count ggplot(aes(x=Year, y=TotalCount)) +
geom_line()
year_total_line
It might be nice to see where each data point falls on the line. To do this we can add another geometry layer.
<-
year_total_line %>%
yearly_count ggplot(aes(x=Year, y=TotalCount)) +
geom_line() +
geom_point()
year_total_line
The +
in the ggplot2
package is particularly useful because it allows you to modify existing ggplot
objects. This means you can easily set up plot templates and conveniently explore different types of plots, so the above plot can also be generated with code like this:
<- year_total_line + geom_point()
year_total_line
year_total_line
There are many ways to customize your plot, like changing the color or line type, adding labels and annotations. One thing that would make our graph easier to read is tick marks at each decade on the x-axis.
<-
year_total_line %>%
yearly_count ggplot(aes(x=Year, y=TotalCount)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = seq(1900, 2000, 10))
year_total_line
Perhaps we would like to add some more information to our graph for context. Let’s drop a reference line at 1963 to clearly indicate on the graph when the vaccine was introduced. To do this we add a geom_vline()
and the annotate()
function. There are multiple ways of adding lines and text to a plot, but these will serve us well for this case. Note that you can change features of lines such as color, type, and size.
<-
year_total_line %>%
yearly_count ggplot(aes(x=Year, y=TotalCount)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = seq(1900, 2000, 10)) +
geom_vline(xintercept = 1963, color = "red", linetype= "dashed") +
annotate(geom = "label", x=1963, y=80000, label="1963: vaccine introduced")
year_total_line
Finally, let’s add a title and axis labels to our plot with the labs()
function. Note that axis labels will automatically be supplied from the column names, but you can use this function to override those defaults.
<-
year_total_line %>%
yearly_count ggplot(aes(x=Year, y=TotalCount)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = seq(1900, 2000, 10)) +
geom_vline(xintercept = 1963, color = "red", linetype= "dashed") +
annotate(geom = "label", x=1963, y=800000, label="1963: vaccine introduced") +
labs(title = "Measles Cases Decrease After Vaccine Introduced", x = "Year", y = "Total Measles Case Count")
year_total_line
Finally, let’s save our plot to a png file, so we can share it or put it in reports. To do this we use the function called ggsave()
.
ggsave("images/yearly_measles_count.png", plot = year_total_line)
## Saving 6 x 3.5 in image