Chapter 14 Customizing/Polishing Graphics
library(tidyverse) # loading ggplot2 and dplyr
library(patchwork) # arranging multiple graphs into 1 figure
library(viridis) # The viridis color schemes
library(latex2exp) # For plotting math notation
library(plotly) # for interactive hover-text
I have a YouTube Video Lecture for this chapter.
We have already seen how to create many basic graphs using the ggplot2
package.
However we haven’t addressed many common scenarios. In this chapter we cover
many graphing tasks that occur.
14.1 Multi-plots
There are several cases where it is reasonable to need to take several possibly
unrelated graphs and put them together into a single larger graph. This is not
possible using facet_wrap
or facet_grid
as they are intended to make multiple
highly related graphs. We could get into the details of how ggplot2
works using
the grid system, but I find looking to other packages that have already done this
to be helpful. The examples below are not the only packages that do this, but
rather are just two solutions I’ve used in the past. Other solutions include the
ggpubr
and patchwork
packages.
14.1.1 cowplot
package
Claus O. Wilke wrote a lovely book about
data visualization and also wrote an R package to help him tweek his plots. One
of the functions in his cowplot
package is called plot_grid
and it takes in
any number of plots and lays them out on a grid.
<- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
P1 geom_point(size=3) + labs(title='P1 - Iris')
<- ggplot(trees, aes(x=Height, y=Volume, color=Girth)) +
P2 geom_point() + labs(title='P2 - Cherry Trees')
<- ggplot(iris, aes(x=Sepal.Length)) +
P3 geom_histogram(bins=30) + labs(title='P3 - Iris')
<- ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) +
P4 geom_boxplot() + labs(title='P4 - Iris')
::plot_grid(P1, P2, P3, P4) cowplot
Notice that the graphs are by default are arranged in a 2x2 grid. We could
adjust the number or rows/columns using the nrow
and ncol
arguments.
Furthermore, we could add labels to each graph so that the figure caption to
refer to “Panel A” or “Panel B” as appropriate using the labels
option.
::plot_grid(P2, P3, P4, nrow=1, labels=c('A','B','C')) cowplot
14.1.2 patchwork
package
This package is quite powerful and is written by one of the maintainers of ggplot2
.
After working with this package, it is quite a bit more powerful than cowplot
and far more flexible.
The package documentation gets into far more detail than we will in the following examples.
# Just adding plots defaults to equally sized grids
+ P2 + P3 + P4 P1
# grouping with parentheses. This induces a hierarchy of the graph
# Notice the amount of space allocated is set to be equal among the
# groups at the same level of the hierarchy.
# division produces new rows.
| P2) / P3 (P1
# A bar produces a new column
+ P2) | P3 (P1
# We can add a title to the composite graph as well as add
# panel identifiers post-hoc. Notice this has to apply to
# the top level of the hierarchy.
| (P2/P4)) +
(P1 plot_annotation(
title = 'Composite Graph Title',
subtitle = 'With a compelling subtitle',
caption = 'Iris Data from Edgar Anderson (1935)'
)
The last trick is to recognize that often we have the same legends in the graph and it would be nice to collect them all into one spot. Lets collect them all into one area.
| (P2/P4)) +
(P1 plot_annotation(title = 'Composite Graph Title') +
plot_layout(guides = 'collect')
Notice we have a fill guide and a color guide. They are different guides as far as ggplot2 is concerned, but it would be nice to remove one of them.
+ theme(legend.position = 'none')) | (P2/P4)) +
((P1 plot_annotation(
title = 'Composite Graph Title',
subtitle = 'With a compelling subtitle',
caption = 'Iris Data from Edgar Anderson (1935)') +
plot_layout(guides = 'collect')
14.2 Customized Scales
While ggplot
typically produces very reasonable choices for values for the
axis scales and color choices for the color
and fill
options, we often want
to tweak them.
14.2.1 Color Scales
14.2.1.1 Manually Select Colors
For an individual graph, we might want to set the color manually. Within
ggplot
there are a number of scale_XXX_
functions where the XXX
is
either color
or fill
.
::plot_grid(
cowplot+ scale_color_manual( values=c('red','navy','forest green') ),
P1 + scale_color_gradient(low = 'blue', high='red')
P2 )
For continuous scales for fill and color, there is also a scale_XXX_gradient2()
function which results in a divergent scale where you set the low
and high
values as well as the midpoint color and value. There is also a
scale_XXX_grandientn()
function that allows you to set as many colors as you
like to move between.
::plot_grid(
cowplot+ scale_color_gradient2( low = 'black', mid='white', midpoint=14, high='red' ),
P2 + scale_color_gradientn(colors = c('red','orange','yellow','green','blue','violet') )
P2 )
Generally I find that I make poor choices when picking colors manually, but there are times that it is appropriate.
14.2.1.2 Palettes
In choosing color schemes, a good approach is to use a color palette that has already been created by folks that know about how colors are displayed and what sort of color blindness is possible. There are two palette options that we’ll discuss, but there are a variety of other palettes available by downloading a package.
14.2.1.2.1 RColorBrewer
palettes
Using the ggplot::scale_XXX_brewer()
functions, we can easily work with the
package RColorBrewer
which provides a nice set of color palettes. These
palettes are separated by purpose.
Qualitative palettes employ different hues to create visual differences between classes. These palettes are suggested for nominal or categorical data sets.
Sequential palettes progress from light to dark. When used with interval data, light colors represent low data values and dark colors represent high data values.
Diverging palettes are composed of darker colors of contrasting hues on the high and low extremes and lighter colors in the middle.
To use one of these palettes, we just need to pass the palette name to
scale_color_brewer
or scale_fill_brewer
::plot_grid(
cowplot+ scale_color_brewer(palette='Dark2'),
P1 + scale_fill_brewer(palette='Dark2')
P4 )
14.2.1.2.2 viridis
palettes
The package viridis
sets up a few different color palettes that have been well
thought out and maintain contrast for people with a variety of color-blindess
types as well as being converted to grey-scale.
::plot_grid(
cowplot+ scale_color_viridis_d(option='plasma'), # _d for discrete
P1 + scale_color_viridis_c( option='viridis') ) # _c for continuous P2
There are a bunch of other packages that manage color palettes such as
paletteer
, ggsci
and wesanderson
.
14.2.2 Setting major & minor ticks
For continuous variables, we need to be able to control what tick and grid lines
are displayed. In ggplot
, there are major and minor breaks and the major
breaks are labeled and minor breaks are in-between the major breaks. The break
point labels can also be set.
ggplot(trees, aes(x=Height, y=Volume)) + geom_point() +
scale_x_continuous( breaks=seq(65,90, by=5), minor_breaks=65:90 ) +
scale_y_continuous( breaks=c(30,50), labels=c('small','big') )
14.2.3 Log Scales
For this example, we’ll use the ACS
data from the Lock5Data
package that has
information about Income
(in thousands of dollars) and Age
. Lets make a scatter
plot of the data.
# Import the data and drop any zeros
data('ACS', package='Lock5Data')
<- ACS %>%
ACS drop_na() %>% filter(Income > 0)
::plot_grid(
cowplotggplot(ACS, aes(x=Age, y=Income)) +
geom_point(),
ggplot(ACS, aes(x=Age, y=log10(Income))) +
geom_point()
)
Plotting the raw data results in an ugly graph because six observations dominate the graph and the bulk of the data (income < $100,000) is squished together. One solution is to plot income on the \(\log_{10}\) scale. The second graph does that, but the labeling is all done on the log-scale and most people have a hard time thinking in terms of logs.
This works quite well to see the trend of peak earning happening in a persons 40s and 50s, but the scale is difficult for me to understand (what does \(\log_{10}\left(X\right)=1\) mean here? Oh right, that is \(10^{1}=X\) so that is the $10,000 line). It would be really nice if we could do the transformation but have the labels on the original scale.
::plot_grid(
cowplotggplot(ACS, aes(x=Age, y=Income)) +
geom_point() +
scale_y_log10(),
ggplot(ACS, aes(x=Age, y=Income)) +
geom_point() +
scale_y_log10(breaks=c(1,10,100),
minor=c(1:10,
seq( 10, 100,by=10 ),
seq(100,1000,by=100))) +
ylab('Income (1000s of dollars)')
)
While it is certainly possible to specify everything explicitly, I prefer to
have a little bit of code to define the minor breaks where all I have to specify
is the minimum and maximum order of magnitudes to specify. In the following code,
that is the 0:2
components. In the minor breaks, the by=1
or by=2
specifies
if we should have 9 or 4 minor breaks per major break.
::plot_grid(
cowplotggplot(ACS, aes(x=Age, y=Income)) +
geom_point() +
scale_y_log10(
breaks = 10^(0:2),
minor = outer(seq(0,10,by=1), 10^(0:2)) %>% as.vector()
+
) ylab('Income (1000s of dollars)'),
ggplot(ACS, aes(x=Age, y=Income)) +
geom_point() +
scale_y_log10(
breaks = 10^(0:2),
minor = outer(seq(0,10,by=2), 10^(0:2)) %>% as.vector()
+
) ylab('Income (1000s of dollars)')
)
Now the y-axis is in the original units (thousands of dollars) but we’d like to
customize the minor tick marks. Lets define the major break points (the white
lines that have numerical labels) to be at 1,10,100 thousand dollars in salary.
Likewise we will tell ggplot2
to set minor break points at 1 to 10 thousand
dollars (with steps of 1 thousand dollars) and then 10 thousand to 100 thousand
but with step sizes of 10 thousand, and finally minor breaks above 100 thousand
being in steps of 100 thousand.
14.3 Themes
Many fine-tuning settings in ggplot2
can be manipulated using the theme()
function. I’ve used it previously to move the legend position, but there are
many other options.
<- ggplot(ChickWeight, aes(x=Time, y=weight, colour=Diet, group=Chick)) +
p1 geom_line() + labs(title='Chick Weight: Birth to 21 days')
# Two common examples of things to change
::plot_grid(nrow=2,
cowplot+ theme(plot.title = element_text(hjust = 0.5, size=25)),
p1 + theme(legend.position = 'bottom') # legend to bottom
p1 )
There are many things to tweak using the theme()
command and to get a better
idea of what is possible, I recommend visiting the ggplot2 web page documentation
and examples.
Notably, one thing that is NOT changed using the theme
command is the color
scales.
A great deal of thought went into the default settings of ggplot2 to maximize
the visual clarity of the graphs. However some people believe the defaults for
many of the tiny graphical settings are poor. You can modify each of these but
it is often easier to modify them all at once by selecting a different theme.
The ggplot2 package includes several, theme_bw()
, and theme_minimal()
being
the two that I use most often. Other packages, such as cowplot
and ggthemes
,
have a bunch of other themes that you can select from. Below are a few examples:
::multiplot( cols = 2,
Rmisc+ theme_bw() + labs(title='theme_bw'), # Black and white
p1 + theme_minimal() + labs(title='theme_minimal'),
p1 + theme_dark() + labs(title='theme_dark'),
p1 + theme_light() + labs(title='theme_light')
p1 )
::multiplot( cols = 2,
Rmisc+ cowplot::theme_cowplot() + labs(title='cowplot::theme_cowplot()'),
p1 + cowplot::theme_minimal_grid() + labs(title='cowplot::theme_minimial_grid'),
p1 + ggthemes::theme_stata() + labs(title='ggthemes::theme_stata()'),
p1 + ggthemes::theme_tufte() + labs(title='ggthemes::theme_tufte()'),
p1 + ggthemes::theme_economist() + labs(title='ggthemes::theme_economist()'),
p1 + ggthemes::theme_fivethirtyeight() + labs(title='ggthemes::theme_fivethirtyeight()'),
p1 + ggthemes::theme_excel_new() + labs(title='ggthemes::theme_excel_new()'),
p1 + ggthemes::theme_gdocs() + labs(title='ggthemes::theme_gdocs()')
p1 )
Finally, we might want to select a theme for all subsequent plots or modify a specific aspect of the theme.
Command | Result |
---|---|
theme_set( theme_bw() ) |
Set the default theme to be the theme_bw() theme. |
theme_update( ... ) |
Update the current default them. |
This will allow you to set the graphing options at the start of your Rmarkdown/R-script document. However the one thing it does not do is allow you to change the default color themes (we still have to do for each graph).
14.4 Mathematical Notation
It would be nice to be able to include mathematical formula and notation on plot
axes, titles, and text annotation. R plotting has a notation scheme which it calls
expressions
. You can learn more about how R expressions
are defined by
looking at the
plotmath help
help page. They are similar to LaTeX but different enough that it can be
frustrating to use. It is particularly difficult to mix character strings and
math symbols. I recommend not bothering to learn R expressions, but instead
learn LaTeX and use the R package latex2exp
that converts character strings
written in LaTeX to be converted into R’s expressions.
LaTeX is an extremely common typesetting program for mathematics and is widely used. The key idea is that $
will open/close the LaTeX mode and within LaTeX mode, using the backslash represents that something special should be done. For example, just typing $alpha$
produces \(alpha\), but putting a backslash in front means that we should interpret it as the greek letter alpha. So in LaTeX, $\alpha$
is rendered as \(\alpha\). We’ve already seen an introduction to LaTeX in the Rmarkdown Tricks chapter.
However, because I need to write character strings with LaTeX syntax, and R also uses the backslash to represent special characters, then to get the backslash into the character string, we actually need to do the same double backslash trick we did in the string manipulations using regular expressions section.
<- 7867
seed <- 20
N <- data.frame(x=runif(N, 1, 10)) %>% # create a data set to work with
data mutate(y = 12 - 1*x + rnorm(N, sd=1)) # with columns x, y
<- lm( y ~ x, data=data) # Fit a regression model
model <- data %>% # save the regression line yhat points
data mutate(fit=fitted(model))
ggplot(data, aes(x=x)) +
geom_point(aes(y=y)) +
geom_line(aes(y=fit), color='blue') +
annotate('text', x=9, y=9.5,
label=latex2exp::TeX('$\\hat{\\rho}$ = 0.916') ) + # always double \\
labs( x=latex2exp::TeX('X-axis $\\alpha$'),
y=latex2exp::TeX('Y: $\\log_{10}$(Income)'),
title=latex2exp::TeX('Linear Models use: $\\hat{\\beta} = (X^TX)^{-1}X^Ty$'))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type
## 'expression'
The warning message that is produced is coming from ggplot
and I haven’t
figured out how to avoid it. Because it is giving us the graph we want, I’m just
going to ignore the error for now.
One issue is how to add expression to a data frame. Unfortunately, neither
data.frame
nor tibble
will allow a column of expressions, so instead we have
store it as a character string. Below, we create three character strings using
standard LaTeX syntax, and then convert it to a character string that represents
the R expression. Finally, in ggplot
, we tell the geom_text
layer to parse
the label and interpret it as an R expression.
<- data.frame( x=c(1,2,3), y=c(2,2,2) ) %>%
foo mutate( label1 = paste('$\\alpha$ = ', x) ) %>% # label is a TeX character string
mutate( label2 = latex2exp::TeX(label1, output = 'character') ) # label2 is an expression character string
ggplot(foo, aes(x=x, y=y) ) +
geom_label( aes(label=label2), parse=TRUE ) # parse=TRUE forces an expression interpretation
14.5 Interactive plots with plotly
Plotly is technical computing company that specializes in data visualization. They have created an open source JavaScript library to produce graphs, which is confusingly referred to as plotly. Because plotly is JavaScript, and RStudio’s Viewer pane is actually a web browser, it easily provides interactive abilities in RStudios Viewer pane.
A good tutorial book about using plotly was written by Carson Sievert.
The simple version is that you can take a ggplot graph and pipe it into the
ggplotly
and it will be rendered into an interactive version of the graph.
<- trees %>%
P1 ggplot(aes(y=Volume, x=Height)) +
geom_point()
%>% plotly::ggplotly() P1
We can use the widgets to zoom in and out of the graph. Notice that as we hover
over the point, it tells us the x/y coordinates. To add information to the
hover-text, we just need to add a text
aesthetic mapping to the plot.
<- trees %>%
P1 mutate(Obs_ID = row_number()) %>%
ggplot(aes(y=Volume, x=Height,
text=paste('Girth: ', Girth, '\n', # add some extra text
'Obs #: ', Obs_ID, sep=''))) + # to the hover information
geom_point()
%>% ggplotly() P1
14.6 Exercises
The
infmort
data set from the packagefaraway
gives the infant mortality rate for a variety of countries. The information is relatively out of date (from 1970s?), but will be fun to graph. Visualize the data using by creating scatter plots of mortality vs income while faceting usingregion
and setting color byoil
export status. Utilize a \(\log_{10}\) transformation for bothmortality
andincome
axes. This can be done either by doing the transformation inside theaes()
command or by utilizing thescale_x_log10()
orscale_y_log10()
layers. The critical difference is if the scales are on the original vs log transformed scale. Experiment with both and see which you prefer.- The
rownames()
of the table gives the country names and you should create a new column that contains the country names. *rownames
- Create scatter plots with the
log10()
transformation inside theaes()
command. - Create the scatter plots using the
scale_x_log10()
andscale_y_log10()
. Set the major and minor breaks to be useful and aesthetically pleasing. Comment on which version you find easier to read. - The package
ggrepel
contains functionsgeom_text_repel()
andgeom_label_repel()
that mimic the basicgeom_text()
andgeom_label()
functions inggplot2
, but work to make sure the labels don’t overlap. Select 10-15 countries to label and do so using thegeom_text_repel()
function.
- The
Using the
datasets::trees
data, complete the following:Create a regression model for \(y=\)
Volume
as a function of \(x=\)Height
.Using the
summary
command, get the y-intercept and slope of the regression line.Using
ggplot2
, create a scatter plot of Volume vs Height.Create a nice white filled rectangle to add text information to using by adding the following annotation layer.
annotate('rect', xmin=65, xmax=75, ymin=60, ymax=74, fill='white', color='black') +
Add some annotation text to write the equation of the line \(\hat{y}_i = -87.12 + 1.54 * x_i\) in the text area.
Add annotation to add \(R^2 = 0.358\)
Add the regression line in red. The most convenient layer function to uses is
geom_abline()
. It appears that theannotate
doesn’t work withgeom_abline()
so you’ll have to call it directly.
In
datasets::Titanic
table summarizes the survival of passengers aboard the ocean liner Titanic. It includes information about passenger class, sex, and age (adult or child). Create a bar graph showing the number of individuals that survived based on the passengerClass
,Sex
, andAge
variable information. You’ll need to use faceting and/or color to get all four variables on the same graph. Make sure that differences in survival among different classes of children are perceivable. Unfortunately, the data is stored as atable
and to expand it to a data frame, the following code can be used.<- Titanic %>% as.data.frame() Titanic
- Make this graph using the default theme. If you use color to denote survivorship, modify the color scheme so that a cold color denotes death.
- Make this graph using the
theme_bw()
theme. - Make this graph using the
cowplot::theme_minimal_hgrid()
theme. - Why would it be beneficial to drop the vertical grid lines?