Data Visualization

This chapter will cover plotting functions from the ggplot2 package.

Create a Plot

We first create an initial ggplot object using ggplot() function. We declare the data for plotting within ggplot(), then add layers/components (such as geom functions) to the plot.

To construct a scatterplot, for instance, we add geom_point() with an aesthetic mapping argument inside the parentheses. The aesthetic mappings aes() describe how variables in the data are mapped to visual properties of geometric objects (geom).

## ggplot(data = data_object, mapping = aes(global aesthetics)) +
##   <geom_function>(mapping = aes(specific aesthetics))

ggplot(data = hsb) +   # create a ggplot
  geom_point(mapping = aes(ses, mathach))  # add a layer of points (scatterplot)

# or 

ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point()   

## You can also use the pipe operator

hsb %>%    # data 
  ggplot() +     # create a ggplot 
  geom_point(mapping = aes(ses, mathach)) # add geom_function()

Geom Functions

Frequently used geom functions are introduced in this section. More functions can be found here: https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf

< One variable >

  • geom_histogram() for a continuous variable
ggplot(data = hsb) +   
  geom_histogram(mapping = aes(mathach), bins = 25) 
#or 
ggplot(data = hsb, mapping = aes(mathach)) +   
  geom_histogram(bins = 25) # can set the number of bins

  • geom_density() for a continuous variable
ggplot(data = hsb) +   
  geom_histogram(mapping = aes(mathach)) 
# or 
ggplot(data = hsb, mapping = aes(mathach)) +   
  geom_density() 

  • geom_bar() for a discrete variable
ggplot(data = hsb, mapping = aes(female)) + 
  geom_bar() 

< Two variables >

  • geom_boxplot() for discrete X, continuous Y
ggplot(data = hsb, mapping = aes(female, mathach)) + 
  geom_boxplot() 

  • geom_point() for continuous X, Y
ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point() 

  • geom_text() for continuous X, Y
## need 'label' aesthetic
ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_text(mapping = aes(label=id)) # plot labels 

## Add labels to a scatterplot
ggplot(data = hsb, mapping = aes(ses, mathach)) +  
  geom_point() +  # draw a scatterplot
  geom_text(mapping = aes(label=id), nudge_x = 0.08) # add 'id' labels to points

  • geom_smooth() for continuous X, Y
ggplot(data = hsb, mapping = aes(ses, mathach)) +  
  geom_point() +  # scatterplot
  geom_smooth() # add a smooth line fitted to the data
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

< Line segments >

  • geom_abline()
ggplot(data = hsb, mapping = aes(ses, mathach)) +  
  geom_point() +  # scatterplot
  geom_abline(aes(intercept = 13.349, slope = 3.224)) # add a regression line

## To add the regression equation 
ggplot(data = hsb, mapping = aes(ses, mathach)) +  
  geom_point() +  # scatterplot
  geom_abline(aes(intercept = 13.349, slope = 3.224)) + # add a regression line
  geom_text(aes(label = "Mathach = 13.35 + 3.22 * SES + e"), x = 0.93, y = 13.5) # add the equation

Color/Size/Shape/Linetype

  • Color/Opacity

To map colors/opacity to a specific variable, add color/fill or alpha (which controls transparency) inside aes(). If you want to set colors/opacity, add them outside aes().

## Map colors/opacity to a variable
ggplot(data = hsb, mapping = aes(ses, mathach)) +   # create a ggplot object
  geom_point(aes(color = female))   # map colors to 'female' (discrete)

ggplot(data = hsb, mapping = aes(ses, mathach)) +   # create a ggplot object
  geom_point(aes(color = ses))   # map colors to 'ses' (continuous)

ggplot(data = hsb, mapping = aes(ses, mathach)) +  
  geom_point(aes(alpha = ses))  # map opacify to 'ses (continuous)

ggplot(data = hsb, mapping = aes(mathach)) +   
  geom_density(aes(fill=female))  # map colors to 'female'

ggplot(data = hsb, mapping = aes(mathach)) +   
  geom_density(aes(fill=female), alpha = 0.3) # map colors to 'female' & set alpha to 0.3

## Set colors to a variable
ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point(color = "orange")  # set color of points to orange

ggplot(data = hsb, mapping = aes(mathach)) +   
  geom_density(fill = "pink")  # fill density area with pink

  • Size/Shape/Linetype

Likewise, we can set or map a variable to size/type of points/lines using size, shape, linetype arguments. Below are some examples of mapping these aesthetics to a variable.

## Map size/shape/linetype to a variable
ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point(aes(size = ses)) # map size of points to 'ses'

ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point(aes(shape = female)) # map shape of points to 'female'

ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_smooth(aes(linetype = female)) # map type of lines to 'female'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Again, we need to include the arguments outside aes() to set the size and type of points/lines. To identify the type of points/lines, use the numbers/names shown in the figure below.

To learn more, visit https://ggplot2.tidyverse.org/articles/ggplot2-specs.html)

## set size/type of points/lines 
ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point(size = 3) # increase the point size

ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point(shape = 5) # change the shape of points

ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_smooth(size = 3) # increase the line width
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_smooth(linetype = "longdash") # change the line type
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Facet

ggplot2::facet_wrap() and ggplot2::facet_grid() divide a plot into subplots based on the values of one or more discrete variables. facet_grid() is useful when you have two faceting variables. To facet the plot by a single variable with many levels, it is better to use facet_wrap().

## facet_wrap(vars(var_name)) 

ggplot(data=hsb, mapping = aes(ses, mathach)) +
  geom_point() +
  facet_wrap(vars(female), labeller = "label_both") 

ggplot(data=hsb, mapping = aes(ses, mathach)) +
  geom_point() +
  facet_wrap(vars(female), labeller = "label_both", nrow = 2)

## facet_grid(rows = vars(var1_name), cols = vars(var2_name)) 

ggplot(data=hsb, mapping = aes(ses, mathach)) +
  geom_point() +
  facet_grid(cols = vars(female), labeller = "label_both") 

ggplot(data=hsb, mapping = aes(ses, mathach)) +
  geom_point() +
  facet_grid(rows = vars(minority), cols = vars(female), labeller = "label_both")

We can also use a formula instead of vars() arguments as below.

ggplot(data=hsb, mapping = aes(ses, mathach)) +
  geom_point() +
  facet_grid(minority~female, labeller = "label_both")

ggplot(data=hsb, mapping = aes(ses, mathach)) +
  geom_point() +
  facet_grid(.~female, labeller = "label_both")

ggplot(data=hsb, mapping = aes(ses, mathach)) +
  geom_point() +
  facet_wrap(~female, labeller = "label_both") 

Add Titles/Labels

We can edit/add axes labels and titles using xlab()/ylab()/ggtitle, or labs function.

ggplot(data = hsb, mapping = aes(ses, mathach)) +  
  geom_point() +  
  xlab("Socioeconomic Status (SES)") +  # add x-axis label
  ylab("Math achievement") +           # add y-axis label
  ggtitle("Scatterplot of Math Achievement versus SES") # add title

# or 

## labs(x="xlab", y="ylab", title="title")
ggplot(data = hsb, mapping = aes(ses, mathach)) +   
  geom_point() +  
  labs(x = "Socioeconomic Status (SES)", 
       y = "Math achievement", 
       title = "Scatterplot of Math Achievement versus SES")

Save Plot

We can conveniently save the resulting plot using ggsave(). This function saves the last plot produced.

ggsave("plotname.jpg")

## You can also modify plot size 
ggsave("plotname.pdf", path = "folder_path", width = 4, height = 4)