R has a number of graphing libraries, including base graphics that are installed whenever you install R.
ggplot2, is a graphing library in R that makes beautiful graphs. ggplot2 graph syntax can be formidably complex, with a somewhat steep learning curve.
That being said, learning ggplot2 is worth the effort for a couple of reasons. First, the graphs are beautiful. Second, ggplot2’s syntax, though seemingly arcane at times, forces you to think about the nature of your data, and the ideas that you are graphing. Lastly, a little bit of knowledge about ggplot2 can go a long way, and can build a powerful foundation for future learning.
The intent of this tutorial is to build the foundation of this idea that:
A little bit of ggplot can go a long way
and to give you a simple introduction to the idea that any ggplot graph is composed of:
an
aesthetic
+a geom or two
+other optional elements like titles and themes
.
So, as a quick and simple example…
library(ggplot2)
ggplot(my_demo_data, # the data that I am using
aes(x = my_outcome)) + # aesthetic: what I am graphing
geom_histogram(fill = "red", # geom: how I am graphing it
color = "black")
And now, with titles…
ggplot(my_demo_data, # the data that I am using
aes(x = my_outcome)) + # aesthetic: what I am graphing
geom_histogram(fill = "red", # geom: how I am graphing it
color = "black") +
labs(title = "Your Title Here",
subtitle = "Your Subtitle Here",
caption = "A Caption, If You Want One",
x = "my outcome",
y = "count")
This document is a very brief introduction to the basic ideas of ggplot2. More information about ggplot can be found here. More ggplot2 examples can be found here.
You will need a few R libraries to work in ggplot. You may only need library(ggplot2)
, but some of these other libraries may also be helpful.
In this example, we simulate some data. But your own learning of ggplot will progress more quickly if you use data that you have access to, on an issue that you care about.
Here are the first few rows of simulated data:
# simulated data
N <- 500 # set sample size
predictor <- rnorm(n = N, mean = 100, sd = 25) # n, mean, sd
group <- rbinom(n = N, 1, .5) # n, number of trials, probability
outcome <- predictor +
10 * group +
rnorm(n=N,
mean = 0,
sd = 15) # outcome is a function of predictor + group + error
group <- factor(group)
mydata <- data.frame(predictor, outcome, group) # make data frame
pander(head(mydata, 10)) # nice looking table of first few rows of data
predictor | outcome | group |
---|---|---|
81.64 | 50.28 | 0 |
112.7 | 126.2 | 0 |
92.26 | 116.6 | 0 |
115.5 | 125.7 | 0 |
71.83 | 54.52 | 0 |
97.61 | 101.1 | 0 |
109.8 | 104 | 1 |
129.3 | 153.9 | 1 |
91.53 | 110.7 | 0 |
122.2 | 119.4 | 0 |
There are 3 essential elements to any ggplot call:
For one variable:
p <- ggplot(mydata, aes(x = ...))
This says there is only one variable running along the horizontal x axis in the aesthetic.
The
p <-...
means that we are assigning this graph aesthetic to plot p. We can then add other features to plot p as we continue our work. This iterative nature ofggplot2
is one of the things that makes it so powerful. As your workflow and your documents become more complex, you can build a simple consistent foundation1 for your graphs, then add something simple to make a first graph, and a different something simple to make a second graph.
For two variables:
p <- ggplot(mydata, aes(x = ..., y = ...))
This says there are two variables: one for the horizontal x axis; and another for the vertical y axis, in the aesthetic.
We can then add different geometries to our plot:
For one variable:
+ geom_dotplot()
This says add a dotplot geometry to the graph.
+ geom_histogram()
This says add a histogram geometry to the graph.
+ geom_violin()
This says add a violin plot geometry to the graph.
+ geom_beeswarm()
This says add a beeswarm geometry to the graph.
A beeswarm is a creative layout of points that intuitively lets you understand the distribution of a quantity. The beeswarm geometry requires separate installation of the
ggbeeswarm
package. You also need to calllibrary(ggbeeswarm)
to use this geometry.
For two variables:
+ geom_point()
This says add a point (scatterplot) geometry to the graph.
+ geom_smooth()
This says add a smoother to the graph.
# call ggplot2 where aesthetic is: x uses our predictor variable
p1 <- ggplot(mydata,
aes(x = predictor))
p1 +
geom_dotplot(dotsize = .15, fill="red") + # add dotplot geom in red
labs(title ="Dotplot of predictor") # Add title
p1 + geom_histogram(fill = "blue") + # add histogram geom in blue
labs(title ="Histogram of predictor") # Add title
p1 + geom_density(fill = "gold") + # add density geom in gold
labs(title ="Density of predictor") # Add title
The easiest way to represent a single categorical variable is likely a bar graph.
Here bars represent the count of observations in each group.
Changing the aesthetic slightly results in a stacked bar chart. Since all groups are stacked in 1 bar, we have to add information about the colors that we want to use to distinguish the groups.
p_stacked_barchart <- ggplot(mydata,
aes(x = 1,
fill = group)) +
geom_bar() +
scale_fill_manual(values = c("red", "blue"))
p_stacked_barchart
Here bars represent the average value of our outcome variable for members of each group.
p_barchart_of_mean <- ggplot(mydata,
aes(x = group, # slightly different aesthetic
y = outcome)) +
stat_summary(fun.y = mean, # take the mean of the data
fill = "blue", # fill color
geom = "bar") # we want to summarize data with bars
p_barchart_of_mean
# call ggplot2 where aesthetic uses both predictor and outcome
p4 <- ggplot(mydata,
aes(x = predictor,
y = outcome)) # set up aesthetic
p4 + geom_point() # add point geom (scatterplot)
p4 + # start with basic plot that has only an aesthetic
geom_point(color = "blue") + # add point geom in blue
labs(title ="Scatterplot of Outcome by Predictor") # add title
p4 + geom_density2d(color = "blue") + # add density geom
labs(title ="Density Plot of Outcome by Predictor") # add title
While not strictly necessary, the use of
scale_fill_gradient
seems to improve the presentation. You can choose your own colors.
p4 +
stat_density_2d(aes(fill = ..level..),
geom = "polygon") + # add filled density geom
scale_fill_gradient(low = "blue",
high = "red") +
labs(title ="Density Plot of Outcome by Predictor") # add title
geom_hex
may be a useful visualization, especially when there is the possiblity of over-plotting due to many many points.
p4 +
geom_hex() +
scale_fill_gradient(low = "blue",
high = "red") +
labs(title ="Hexagon Plot of Outcome by Predictor") # add title
ggplot2
ggthemes()
The themes below make use of
library(ggthemes)
which you will need to install.
p4 + geom_point() + # point geom
geom_smooth() + # add smooth geom
labs(title ="Scatterplot And Smoother of Outcome \nby Predictor") + # add title
theme_fivethirtyeight() + # "538"-like theme
scale_color_fivethirtyeight() # "538"-like colors
p4 + geom_point() + # point geom
geom_smooth() + # add smooth geom
labs(title ="Scatterplot And Smoother of Outcome \nby Predictor") + # add title
theme_solarized() + # Google Docs theme
scale_colour_solarized() # Google Docs colors
p4 + geom_point() + # point geom
geom_smooth() + # add smooth geom
labs(title ="Scatterplot And Smoother of Outcome \nby Predictor") + # add title
theme_solarized(light = FALSE) + # solarized dark theme
scale_colour_solarized("blue") # solarized dark color palette
p4 + geom_point() + # point geom
geom_smooth() + # add smooth geom
labs(title ="Scatterplot And Smoother of Outcome \nby Predictor") + # add title
theme_economist() + # Economist magazine theme
scale_colour_economist() # Economist magazine colors
p5 <- ggplot(mydata,
aes(x = predictor, y = outcome,
color = group)) # aesthetic includes color by group
p5 + geom_point() +
geom_smooth() +
theme_economist() +
scale_color_economist() +
labs(title ="Scatterplot And Smoother of Outcome \nby Predictor")
p5 + geom_point() +
geom_smooth() +
facet_wrap(~group) + # facets or "small multiples" by group
theme_economist() +
scale_color_economist() +
labs(title ="Scatterplot And Smoother of Outcome \nby Predictor")
More information can be found at ggplot2.
More ggplot2 examples can be found here.
Graphics made with the ggplot2 graphing library created by Hadley Wickham.
Available online at https://agroganweb.wordpress.com/data-visualization-dataviz/
Quick Introduction to ggplot2 by Andrew Grogan-Kaylor is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Last updated: February 05 2019
at 10:55
By way of illustration, this foundation could be just an aesthetic (e.g. aes(...)
) alone, or possibly an aesthetic plus a theme (e.g. theme_tufte()
), plus axis labels to create a consistent look and feel for your graphs across a report.↩