2 Plots in 2.1 with ggplot or plotly
This section will show you (one way of) how to generate those plots in Slide 2.1 with ggplot
and plotly
.
There is always another approach to achieve the same result. Ask tutor or search online for more resource.
2.1 Set-up
We will be using ggplot2 (one of the libraries from tidyverse) and plotly to generate all the plots in 2.1 and using tidyverse for manipulating the data.
library(tidyverse)
library(plotly)
Generate the two-way table2.
# use the same data from slides
<- matrix(c(32,10,2,12,7,1,63,45,2),ncol=3,byrow=TRUE)
relationship colnames(relationship) <- c('Female', 'Male', 'Non-Binary')
rownames(relationship) <- c('In a relationship', "It's complicated", 'Single')
The two-way table looks like this:
::kable(relationship,
knitralign = "ccc",
caption = "Relation Status and Gender.") %>%
column_spec(
column = 2:3,
border_left = TRUE,
border_right = TRUE)
Female | Male | Non-Binary | |
---|---|---|---|
In a relationship | 32 | 10 | 2 |
It’s complicated | 12 | 7 | 1 |
Single | 63 | 45 | 2 |
However, it may be easier to turn it into a relational table while using ggplot.
<- as.data.frame(relationship)
df <- rownames_to_column(df, "Status") %>% # turn the row name(index) into a column
df gather(Gender, Count, c(2:4)) # reshape the dataframe from wide to long
Now the table looks like this:
::kable(df,
knitralign = "ccc",
caption = "Relation Status and Gender.") %>%
column_spec(
column = 2,
border_left = TRUE,
border_right = TRUE)
Status | Gender | Count |
---|---|---|
In a relationship | Female | 32 |
It’s complicated | Female | 12 |
Single | Female | 63 |
In a relationship | Male | 10 |
It’s complicated | Male | 7 |
Single | Male | 45 |
In a relationship | Non-Binary | 2 |
It’s complicated | Non-Binary | 1 |
Single | Non-Binary | 2 |
2.2 Side-by-side plot
In a side-by-side chart, separate bar charts are given for each category in one of the categorical variables, where the heights of the bars correspond to the elements of a two-way table. We can choose which categorical variable is clustered.
2.2.1 by gender
In the first plot, the clustering variable is gender (the three clusters are male, female and non-binary).
We manually changed the color of the bars to match the plot in lecture slides
<- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
p geom_bar(stat ='identity', position=position_dodge())+
scale_fill_manual(breaks = rownames(relationship),
values=c("#55aaff", "#85c285", "#f46d75")) +
labs(title="Side-by-side Bar chat, by Gender")
<- ggplotly(p,width = 800,height = 400) %>%
p config(displayModeBar = F)
p
2.2.2 by status
WThe second plot uses relationship status (the three clusters are in a relationship, it’s complicated and single) as the clustering variable.
<- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
p geom_bar(stat ='identity', position=position_dodge())+
scale_fill_manual(breaks = rownames(relationship),
values=c("#55aaff", "#85c285", "#f46d75")) +
labs(title="Side-by-side Bar chat, by Status")
<- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p p
2.3 segmented/Stacked Bar Chart
In a segmented/stacked bar chart, the height of each bar represents the number of cases in each category and colour is used to indicate how many cases of each type were in categories of another categorical variable. Like the side-by-sidebar chart, we can choose which categorical variable is used for the bars.
2.3.1 by gender
In the first plot, the bars are defined using gender.
<- ggplot(data=df, aes(x=Gender, y=Count, fill=Status)) +
p geom_bar(stat ='identity')+
scale_fill_brewer(palette = 'Set1')+
labs(title="Stacked Bar chat, by Gender")
<- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p p
2.3.2 by status
In the second plot the bars are defined using relationship status.
<- ggplot(data=df, aes(x=Status, y=Count, fill=Gender)) +
p geom_bar(stat ='identity')+
scale_fill_brewer(palette = 'Set2')+
labs(title="Stacked Bar chat, by Status")
<- ggplotly(p,width = 800,height = 400) %>% config(displayModeBar = F)
p p
These plots are useful to compare between groups and within groups. For example, more females were included in the survey and single was the largest group for both males and females
2.4 100% Stacked Bar Chart
When there are big differences in the number of cases in each category,segmented bar charts can be difficult to interpret – the categories with the most cases dominate the plot and the categories with relatively few cases are unreadable. Scaling the categories used to define the bars allows us to make comparisons between groups and within groups when there are big differences in the number of cases for each category.
Slide 2.1, page 17↩︎