GGPLOT2

Somsak Chanim 🤓

Internation College of Digital Innnovation, CMU

2023-09-16

ggplot2 package

Created by Hadley Wickham in 2005 and released in 2007, ggplot2 is a crucial data visualization package for the R programming language.

ggplot2 can serve as a replacement for the base graphics in R(ggplot2 is the one part of the tidyverse package). Now, ggplot2 has a lot of extension libraries.

ggplot2 logo

Note: The “gg” in ggplot2 stands for “grammar of graphics”.

The structure of ggplot2

For each plot with ggplot can have a maximum of 7 layers.

The first three layers are necessary to produce a plot(data + aesthetics + geometrices).

In the plot you must specify the following layers:

  1. Data (ggplot()): Where is the Data you want to plot?(recommend data is the data frame).

  2. Aesthetics(aes()): What should be on the x and y axis, fill, color, group, or alpha?(character, numeric, date).

  3. Geometries(geom_XXX()): Type of plot, scatter plot, line plot, bar plot or mixed plot.

  4. Facet(facet_XXX()): Generating multiple small plots, each showing a different subset of the data.

  5. Statistics(stat_XXX()): drawing attention to the statistical transformation (for example, draw a graph of empirical distribution or draw the graph of probability density function)

  6. Coordinates(coord_XXX()): Combine the two position aesthetics to produce a 2d position like Linear coordinate systems or Non-linear coordinate systems.

  7. Theme: Set up the background theme, title, legend, axis, font, etc.

How to install

install.packages(ggplot2)
## or
install.packages(tidyverse)

How to used ggplot2.

library(ggplot2)
## or
library(tidyverse)

In this note, we make the ggplot from data frame only.

Bar plot: One dimensional categorical data

set.seed(1)
LETTER= sample(LETTERS[1:5], size = 500, replace = T)
letter = sample(letters[1:5], size = 500, replace = T)
Sample <- data.frame(LETTER, letter)
ggplot(data =Sample)+ 
  aes(x=LETTER) + 
  geom_bar()

Add color

ggplot(data = Sample)+ 
   aes(x = LETTER, fill = LETTER)+ 
   geom_bar()

Add color at border

ggplot(data = Sample)+ 
  aes(x = LETTER, color = LETTER) +  
  geom_bar()

labs() command: Add title or subtitle

ggplot(data = Sample)+
  aes(x = LETTER, fill = LETTER, color = letter) +
  geom_bar() + 
  labs(title = "Bar Graph", subtitle = "random data")

labs() command: Change the name of x or y axis

ggplot(data = Sample)+
  aes(x = LETTER, fill = LETTER)) + 
  geom_bar() +
  labs( title = "Bar Graph", subtitle = "random data", 
        x = "sample", y = "count number")

or

ggplot(data = Sample)+
  aes(x = LETTER, fill = LETTER) + 
  geom_bar() +
  labs( title = "Bar Graph", subtitle = "random data")+ 
    labs( x = "sample", y = "count number")

labs() command: Change the title of fill or color legend.

ggplot(data = Sample)+
  aes(x = LETTER, fill = LETTER) + 
  geom_bar() +
  labs( title = "Bar Graph", subtitle = "random data")+ 
  labs( x = "sample", y = "count number") + 
  labs(fill ="Letter A-E", color ="letter a-e")
ggplot(data = Sample)+
  aes(x = LETTER, fill = LETTER) + 
  geom_bar() +
  labs( title = "Bar Graph", subtitle = "random data")+ 
  labs( x = "sample", y = "count number") + 
  labs(fill ="Letter A-E", color ="letter a-e")

Rotation graph

Move the variable from x axis to y axis

ggplot(data = Sample)+
  aes(y = LETTER, fill = LETTER) + 
  geom_bar() +
  labs( title = "Bar Graph", subtitle = "random data")+ 
  labs( x = "count number", y = "sample") + 
  labs(fill ="Letter A-E", color ="letter a-e")

Stacked barplot for two dimensional categorical data

By put another categorical data into argument fill.

ggplot(Data = Sample)+
  aes(x = LETTER, fill = letter)+ 
  geom_bar()

Stacked bar graph is bar graph with add more character or factor variable

Separate each value in each bar

By setup geom_bar(position = "dodge") or geom_bar(position = "dodge2")

ggplot(Data = Sample)+
  aes(x = LETTER, fill = letter)+ 
  geom_bar(position = "dodge2") 

Diverging bar graph

Example data

set.seed(1)
DF <- data.frame(Group = letters[1:15], 
                 Value = rt(15, df = 4))
ggplot(data = DF)+
  aes(x = Group, y = Value) + 
  geom_bar(stat = "identity")

Reorder the values

Example data

ggplot(data = DF)+ 
  aes(x = reorder(Group, Value), y = Value)+ 
  geom_bar(stat = "identity")

Histogram

The histogram is the most commonly used graph to show frequency distributions(discrete or continuous). It looks like a bar plot, but they have essential differences. For example, the histogram is useful when:

  • The data are numerical

  • You want to see the shape of the data’s distribution.

  • It’s helpful to select the parametric distribution function to test with the data.

  • Determining whether the distribution of two or more variable are different.

  • You wish to communicate data distribution quickly and efficiently to others.

Example

Example data

set.seed(1)
data3 <- data.frame(score = rnorm(1000, mean = 70, sd = 2),
         gender = sample(c("M","F"),1000, replace = TRUE))
ggplot(data = data3)+
  aes(x = score) + 
  geom_histogram(fill = "red", color = "black")

Change the number of bin

By default, the number of the bin is 30. However, we can change the bin number using argument bins = "any positive integer number" in geom_histogram.

ggplot(data = data3)+
  aes(x = score) + 
  geom_histogram(fill = "red", color = "black",bins = 50)

Rescale to density

We need to add argument y = ..density.. in aes().

ggplot(data = data3) +
  aes(x = score, y = ..density..) + 
  geom_histogram(fill ="red", color = "black")

Show multiple plot histogram by categorical variable

If we do multiple plot by categorical variable, we set argument fill = variable in aes() and set argument color = "any color" in geom_histogram().

ggplot(data = data3)+ 
         aes(x = score, fill = gender) + 
         geom_histogram(color = "black")

Transparent color

## alpha in [0,1]
ggplot(data = data3)+ 
  aes(x = score, fill = gender) + 
  geom_histogram(color = "black" ,alpha = .5)

Facet by column

## alpha in [0,1]
ggplot(data = data3)+ 
  aes(x = score, fill = gender) + 
  geom_histogram(color = "black", alpha = .8 ) + 
  facet_wrap(gender ~ .)

Facet by row

## alpha in [0,1]
ggplot(data = data3) + 
  aes(x = score, fill = gender) + 
  geom_histogram(color ="black", alpha = .8) +
  facet_grid(gender ~ .)

Change character to factor for order graph

data3$gender <- factor(data3$gender, levels = c("M", "F"))
ggplot(data = data3) + 
  aes(x = score, fill = gender) + 
  geom_histogram(color = "black", alpha = .5) +
  facet_grid(gender ~ .)

Density plot

It is a smoothed histogram version and is used in the same concept. A density plot is a representation of the distribution of a numeric variable. It uses a kernel density estimate to show the probability density function of the variable.

ggplot(data = data3)+ 
  aes(x = score, fill = gender) + 
  geom_density(kernel = "gaussian", alpha = .5)

Graph for two continuous variables

In the ggplot2, the graph with two continuous variables is used to observe the correlation between their variables. And the scatter plot can add more information by using different colors to display the character or factor variable and putting the size of the point in the scattering point to show the value. We call this graph is bubble plot. Moreover, the scatter plot with the ggplot 2 can add a prediction line for linear and nonlinear regression with/with confidence interval and the ellipses to display the Normal bivariate distribution.

Scatter plot

The example from mtcars dataset.

ggplot(data = mtcars)+
  aes(x = disp, y = mpg) + 
  geom_point()

Bubble plot(add size)

For the variable to put in argument size it should be numeric.

ggplot(data = mtcars)+ 
  aes(x = disp, y = mpg)+ 
  aes(size = hp) + 
  geom_point()

Add color

For the variable to put in argument color, we recommend using character, factor, or setup the color palette by yourself.

ggplot(data = mtcars)+ 
  aes(x = disp, y = mpg)+ 
  aes(size = hp) + 
  aes(color = as.character(cyl))+
  geom_point()

Change the shape

In ggplot has 25 types of shapes, the default shape is the number 16, and the shape’s color is based on argument color. The color of shapes number 21-25 control by argument fill.

Add shape in basic ggplot

We recommend using character or factor for the variable to put in argument shape. If you expert user can see the manual in command scale_shape_identity() and set argument into integer number 0-25.

ggplot(data = mtcars)+ 
  aes(x = disp, y = mpg)+ 
  aes(size = hp) + 
  aes(color = as.character(cyl))+
  aes(shape = as.character(gear)) + 
  geom_point()

Modified shape type by shape number

By adding scale_shape_manual() command and argument values.

ggplot(data = mtcars)+ 
  aes(x = disp, y = mpg)+ 
  aes(size = hp) + 
  aes(color = as.character(cyl))+
  aes(shape = as.character(gear)) + 
  geom_point() + 
  scale_shape_manual(values = c(1,5,10))

Add predicting smoothing line

ggplot(data = mtcars)+ 
  aes( x = disp, y = mpg, size = gear) + 
  aes(color = as.character(cyl)) +
  geom_point() + 
  geom_smooth()

Add predicting linear line

ggplot(data = mtcars)+ 
  aes( x = disp, y = mpg, size = gear) + 
  aes(color = as.character(cyl)) +
  geom_point() +  
  geom_smooth(method = lm)

Remove confidence interval

ggplot(data = mtcars)+ 
  aes( x = disp, y = mpg, size = gear) + 
  aes(color = as.character(cyl)) +
  geom_point() +  
  geom_smooth(method = lm, se = FALSE)

Add ellipse

ggplot(data = mtcars) + 
  aes(x = disp, y = mpg, size = gear) + 
  aes(color = as.character(cyl)) + 
  geom_point() + 
  stat_ellipse()

Change Fonts

Normally, ggplot cannot display Chinese, Thai, or another language font except only English.

ggplot(data = mtcars) + 
  aes(x = disp, y = mpg) + 
  geom_point() + 
  labs(title = "Cars (รถยนต์)")

Solve this problem

We can solve this problem by changing the font at the theme() function

ggplot(data = mtcars) + 
  aes(x = disp, y = mpg) + 
  geom_point() + 
  labs(title = "Cars (รถยนต์)") +
  theme(text = element_text(family = "TH Sarabun New"))

Line plot

A line plot or line chart is handy for display time series data. For R programming, we recommend you to learn structure about ts, zoo, and xts object. But if you keep your data in data frame format with at least one variable as the date and another numeric, you can use ggplot2 to draw the line chart.

data("economics")
str(economics)
spc_tbl_ [574 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ date    : Date[1:574], format: "1967-07-01" "1967-08-01" ...
 $ pce     : num [1:574] 507 510 516 512 517 ...
 $ pop     : num [1:574] 198712 198911 199113 199311 199498 ...
 $ psavert : num [1:574] 12.6 12.6 11.9 12.9 12.8 11.8 11.7 12.3 11.7 12.3 ...
 $ uempmed : num [1:574] 4.5 4.7 4.6 4.9 4.7 4.8 5.1 4.5 4.1 4.6 ...
 $ unemploy: num [1:574] 2944 2945 2958 3143 3066 ...

Line plot with econoimics dataset

In ggplot2 we use geom_line()

ggplot(data = economics)+ 
  aes(x= date, y = uempmed) + 
  geom_line(color = "green")

Line plot modifies.

Line plot with area under the line We can mixed line chart with area chart( geom_area()). ::: {.panel-tabset} ### Code

ggplot(data = economics)+ 
  aes(x= date, y = uempmed) + 
  geom_line(color = "green")+
  geom_area(fill = "green", alpha =.2)

Result

:::

Line plot and trend smoothed line or linear line

Use stat_smooth() and set some argument.

::: {.panel-tabset} ### Code

ggplot(data = economics, aes(x = date, y = uempmed))+
  geom_line(color = "green")+
  stat_smooth(color = "blue")+ ##smoothed line
  stat_smooth(color = "red", method ="lm") ## linear line

Result

:::

Add horizontal or vertical, or diagonal lines.

In ggplot2 we can add horizontal, vertical, or diagonal lines by using this geom and specific arguments.

  • geom_hline(yintercept = ): for horizontal line.

  • geom_vline(xintercept = ): for vertical line.

  • geom_abline(intercept = , slope = ): for diagonal line.

Example

  ggplot(data = economics)+ 
  aes(x= date, y = uempmed)+
  geom_line(color = "green")+
 geom_hline(yintercept = 15, color ="red" )+
  geom_vline(xintercept = as.numeric(economics$date[500]), 
                  color = "blue")+
  geom_abline(intercept = 5, 
                  slope = (20-5)/(as.numeric(economics$date[500])-
                           as.numeric(economics$date[1])),
                 color = "black")

Put the text or rectangle with text to the graph

We can geom_text() for add only text and geom_label() for the rectangle with text into to the graph. For both geoms we use an argument label to put the text at the x-axis and the y-axis in aes(). We recommend adding the text with an important thing that we need to focus on. Example: We focus on this car

which(row.names(mtcars)=="Chrysler Imperial")
[1] 17
which(row.names(mtcars)=="Toyota Corolla")
[1] 20

Example code

ggplot(data = mtcars) +
 aes(x = mpg, y = wt, color = gear)+
 geom_point()+ 
 geom_text(  aes(x = mpg[17], y = wt[17],
                 label = row.names(mtcars)[17]))+
 geom_label(aes(x = mpg[20], y = wt[20], 
                 label = row.names(mtcars)[20]))

Theme in ggplot2

By the default of theme in ggplot is theme_gray() (No need to add this command). You can use another theme

  • theme_bw(): The classic dark-on-light ggplot2 theme.

  • theme_linedraw(): A theme with only black lines of various widths on white backgrounds, reminiscent of a line drawing.

  • theme_light(): A theme similar to theme_linedraw()

  • theme_dark(): The dark cousin of theme_light(), with similar line sizes but a dark background.

  • theme_minimal(): A minimalistic theme with no background annotations.

  • theme_void(): A completely empty theme.

Try to use the difference theme by yourself.

For extra theme in ggplot2 you need to install package “ggthemes

library(ggthemes)

Use GUI to help you to make a ggplot.

install.packages("esquisse")

click

Multiple plots with patchwork package

## install.packages("patchwork")
library(patchwork)
## 1 plot
p1 <- ggplot(mtcars, aes(x = disp, y = mpg, size = gear,
             color = cyl)) +
      geom_point() + geom_smooth()
p2 <- ggplot(mtcars, aes(x = disp, y = mpg, size = gear,
             color = cyl)) +
      geom_point() + geom_smooth(method = lm)
p3 <- ggplot(mtcars, aes(x = disp, y = mpg, size = gear,
             color = as.character(cyl))) +
      geom_point() +geom_smooth(method = lm, se = FALSE)

Two plots in one row

p1 + p2

Two plots in one column

p1 / p3

One plot in the first row and two plots in the second row

p3 / (p1 + p2)

Three plots in one row

p1 + p2 + p3

Three plots in one column

p1 / p2 / p3

Another command for combine plot

p1 | p2

One plot in the first column and two plots in the second column

p1 | (p2 / p3)

Interactive plot (plotly package)

## install.packages("plotly")
library(plotly)

We can change static plot from ggplot to interactive plot using plotly package, We use interactive plot in htm. For basic command in plotly can follow this step

Step1: Make a ggplot and assign to any variable name(example p1, p2 , or p3 in previous topic).

Step2: Put the variable into ggplotly() command.

Interactive plot example

p4 <- ggplot(data = mtcars) + 
  aes(x = disp, y = mpg, size = gear) +
  aes(color = as.character(cyl)) +    
  geom_point()
ggplotly(p4)