2 Plots in 2.1 with ggplot or plotly

This section will show you (one way of) how to generate those plots in Slide 2.1 with ggplot and plotly.

There is always another approach to achieve the same result. Ask tutor or search online for more resource.

2.1 Set-up

We will be using ggplot2 (one of the libraries from tidyverse) and plotly to generate all the plots in 2.1 and using tidyverse for manipulating the data.

library(tidyverse)
library(plotly)

Generate the two-way table².

# use the same data from slides
relationship <- matrix(c(32,10,2,12,7,1,63,45,2),ncol=3,byrow=TRUE)
colnames(relationship) <- c('Female', 'Male', 'Non-Binary')
rownames(relationship) <- c('In a relationship', "It's complicated", 'Single')

The two-way table looks like this:

knitr::kable(relationship,
             align = "ccc",
             caption = "Relation Status and Gender.") %>%
  column_spec(
    column = 2:3,
    border_left = TRUE,
    border_right = TRUE)

Table 2.1: Relation Status and Gender.
	Female	Male	Non-Binary
In a relationship	32	10	2
It’s complicated	12	7	1
Single	63	45	2

However, it may be easier to turn it into a relational table while using ggplot.

df <- as.data.frame(relationship) 
df <- rownames_to_column(df, "Status") %>% # turn the row name(index) into a column
  gather(Gender, Count, c(2:4)) # reshape the dataframe from wide to long

Now the table looks like this:

knitr::kable(df,
             align = "ccc",
             caption = "Relation Status and Gender.") %>%
  column_spec(
    column = 2,
    border_left = TRUE,
    border_right = TRUE)

Table 2.2: Relation Status and Gender.
Status	Gender	Count
In a relationship	Female	32
It’s complicated	Female	12
Single	Female	63
In a relationship	Male	10
It’s complicated	Male	7
Single	Male	45
In a relationship	Non-Binary	2
It’s complicated	Non-Binary	1
Single	Non-Binary	2

2.2 Side-by-side plot

In a side-by-side chart, separate bar charts are given for each category in one of the categorical variables, where the heights of the bars correspond to the elements of a two-way table. We can choose which categorical variable is clustered.

2.2.1 by gender

In the first plot, the clustering variable is gender (the three clusters are male, female and non-binary).

We manually changed the color of the bars to match the plot in lecture slides

p <- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
  geom_bar(stat ='identity', position=position_dodge())+
  scale_fill_manual(breaks = rownames(relationship), 
                       values=c("#55aaff", "#85c285", "#f46d75")) +
  labs(title="Side-by-side Bar chat, by Gender")

p <- ggplotly(p,width = 800,height = 400) %>% 
  config(displayModeBar = F) 
p

2.2.2 by status

WThe second plot uses relationship status (the three clusters are in a relationship, it’s complicated and single) as the clustering variable.

p <- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
  geom_bar(stat ='identity', position=position_dodge())+
  scale_fill_manual(breaks = rownames(relationship), 
                       values=c("#55aaff", "#85c285", "#f46d75")) +
  labs(title="Side-by-side Bar chat, by Status")

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

2.3 segmented/Stacked Bar Chart

In a segmented/stacked bar chart, the height of each bar represents the number of cases in each category and colour is used to indicate how many cases of each type were in categories of another categorical variable. Like the side-by-sidebar chart, we can choose which categorical variable is used for the bars.

2.3.1 by gender

In the first plot, the bars are defined using gender.

p <- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
  geom_bar(stat ='identity')+
  scale_fill_brewer(palette = 'Set1')+
  labs(title="Stacked Bar chat, by Gender")

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

2.3.2 by status

In the second plot the bars are defined using relationship status.

p <- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
  geom_bar(stat ='identity')+
  scale_fill_brewer(palette = 'Set2')+
  labs(title="Stacked Bar chat, by Status")

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

These plots are useful to compare between groups and within groups. For example, more females were included in the survey and single was the largest group for both males and females

2.4 100% Stacked Bar Chart

When there are big differences in the number of cases in each category,segmented bar charts can be difficult to interpret – the categories with the most cases dominate the plot and the categories with relatively few cases are unreadable. Scaling the categories used to define the bars allows us to make comparisons between groups and within groups when there are big differences in the number of cases for each category.

2.4.1 by gender

p <- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
  geom_bar(stat ='identity', position='fill')+
  scale_fill_brewer(palette = 'Set1')+
  labs(title="100% Stacked Bar chat, by Gender", y='Proportion')

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

2.4.2 by status

p <- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
  geom_bar(stat ='identity', position='fill')+
  scale_fill_brewer(palette = 'Set2')+
  labs(title="100% Stacked Bar chat, by Status", y='Proportion')

p <- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p

Slide 2.1, page 17↩︎