2 Plots in 2.1 with ggplot or plotly

This section will show you (one way of) how to generate those plots in Slide 2.1 with ggplot and plotly.

There is always another approach to achieve the same result. Ask tutor or search online for more resource.

2.1 Set-up

We will be using ggplot2 (one of the libraries from tidyverse) and plotly to generate all the plots in 2.1 and using tidyverse for manipulating the data.

library(tidyverse)
library(plotly)

Generate the two-way table2.

# use the same data from slides
relationship <- matrix(c(32,10,2,12,7,1,63,45,2),ncol=3,byrow=TRUE)
colnames(relationship) <- c('Female', 'Male', 'Non-Binary')
rownames(relationship) <- c('In a relationship', "It's complicated", 'Single')

The two-way table looks like this:

knitr::kable(relationship,
             align = "ccc",
             caption = "Relation Status and Gender.") %>%
  column_spec(
    column = 2:3,
    border_left = TRUE,
    border_right = TRUE)
Table 2.1: Relation Status and Gender.
Female Male Non-Binary
In a relationship 32 10 2
It’s complicated 12 7 1
Single 63 45 2

However, it may be easier to turn it into a relational table while using ggplot.

df <- as.data.frame(relationship) 
df <- rownames_to_column(df, "Status") %>% # turn the row name(index) into a column
  gather(Gender, Count, c(2:4)) # reshape the dataframe from wide to long

Now the table looks like this:

knitr::kable(df,
             align = "ccc",
             caption = "Relation Status and Gender.") %>%
  column_spec(
    column = 2,
    border_left = TRUE,
    border_right = TRUE)
Table 2.2: Relation Status and Gender.
Status Gender Count
In a relationship Female 32
It’s complicated Female 12
Single Female 63
In a relationship Male 10
It’s complicated Male 7
Single Male 45
In a relationship Non-Binary 2
It’s complicated Non-Binary 1
Single Non-Binary 2

2.2 Side-by-side plot

In a side-by-side chart, separate bar charts are given for each category in one of the categorical variables, where the heights of the bars correspond to the elements of a two-way table. We can choose which categorical variable is clustered.

2.2.1 by gender

In the first plot, the clustering variable is gender (the three clusters are male, female and non-binary).

We manually changed the color of the bars to match the plot in lecture slides

p <- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
  geom_bar(stat ='identity', position=position_dodge())+
  scale_fill_manual(breaks = rownames(relationship), 
                       values=c("#55aaff", "#85c285", "#f46d75")) +
  labs(title="Side-by-side Bar chat, by Gender")

p <- ggplotly(p,width = 800,height = 400) %>% 
  config(displayModeBar = F) 
p

2.2.2 by status

WThe second plot uses relationship status (the three clusters are in a relationship, it’s complicated and single) as the clustering variable.

p <- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
  geom_bar(stat ='identity', position=position_dodge())+
  scale_fill_manual(breaks = rownames(relationship), 
                       values=c("#55aaff", "#85c285", "#f46d75")) +
  labs(title="Side-by-side Bar chat, by Status")

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

2.3 segmented/Stacked Bar Chart

In a segmented/stacked bar chart, the height of each bar represents the number of cases in each category and colour is used to indicate how many cases of each type were in categories of another categorical variable. Like the side-by-sidebar chart, we can choose which categorical variable is used for the bars.

2.3.1 by gender

In the first plot, the bars are defined using gender.

p <- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
  geom_bar(stat ='identity')+
  scale_fill_brewer(palette = 'Set1')+
  labs(title="Stacked Bar chat, by Gender")

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

2.3.2 by status

In the second plot the bars are defined using relationship status.

p <- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
  geom_bar(stat ='identity')+
  scale_fill_brewer(palette = 'Set2')+
  labs(title="Stacked Bar chat, by Status")

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

These plots are useful to compare between groups and within groups. For example, more females were included in the survey and single was the largest group for both males and females

2.4 100% Stacked Bar Chart

When there are big differences in the number of cases in each category,segmented bar charts can be difficult to interpret – the categories with the most cases dominate the plot and the categories with relatively few cases are unreadable. Scaling the categories used to define the bars allows us to make comparisons between groups and within groups when there are big differences in the number of cases for each category.

2.4.1 by gender

p <- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
  geom_bar(stat ='identity', position='fill')+
  scale_fill_brewer(palette = 'Set1')+
  labs(title="100% Stacked Bar chat, by Gender", y='Proportion')

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

2.4.2 by status

p <- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
  geom_bar(stat ='identity', position='fill')+
  scale_fill_brewer(palette = 'Set2')+
  labs(title="100% Stacked Bar chat, by Status", y='Proportion')

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

  1. Slide 2.1, page 17↩︎