8 Descriptive stats with R and ggplot2
In this chapter you will practice exploratory data analysis and get to know the ggplot2 package. The ggplot2 package is a graphing package that allows you to create all kinds of fancy looking plots. It is important that you replicate the code and graphs, and experiment with changing parameters.We also use the dplyr package, which allows us to easily handle and arrange data. You need to install both packages unless you already did so and then load them into your library.
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.3.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.3.2
We will use the titanic data set “titanic_333.csv”, which contains data on the passengers on the Titanic. I removed the passengers’ home towns, and some other non-pertinent info. The full data set can be found at https://www.kaggle.com/datasets/vinicius150987/titanic3?resource=download, if you are interested..
8.1 Exploring the data
First, we read in the data set. Note that this only works if you download the file into your working directory first. The commands ‘str’, and glimpse’ give us a first look at the data.
titanic <- read.csv("titanic_333.csv")
str(titanic)
#> 'data.frame': 1309 obs. of 9 variables:
#> $ X : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ pclass : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ survived: int 1 1 0 0 0 1 1 0 1 0 ...
#> $ name : chr "Allen, Miss. Elisabeth Walton" "Allison, Master. Hudson Trevor" "Allison, Miss. Helen Loraine" "Allison, Mr. Hudson Joshua Creighton" ...
#> $ sex : chr "female" "male" "female" "male" ...
#> $ age : num 29 0.917 2 30 25 ...
#> $ sibsp : int 0 1 1 1 1 0 1 0 2 0 ...
#> $ parch : int 0 2 2 2 2 0 0 0 0 0 ...
#> $ fare : num 211 152 152 152 152 ...
glimpse(titanic)
#> Rows: 1,309
#> Columns: 9
#> $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
#> $ pclass <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ survived <int> 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1,…
#> $ name <chr> "Allen, Miss. Elisabeth Walton", "Allison…
#> $ sex <chr> "female", "male", "female", "male", "fema…
#> $ age <dbl> 29.0000, 0.9167, 2.0000, 30.0000, 25.0000…
#> $ sibsp <int> 0, 1, 1, 1, 1, 0, 1, 0, 2, 0, 1, 1, 0, 0,…
#> $ parch <int> 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ fare <dbl> 211.3375, 151.5500, 151.5500, 151.5500, 1…
‘summary’ will give us the 5-point summary for each of the numeric variables. We see that some of the observations are missing entries for age and/ or fare. We need to keep that in mind.
summary(titanic)
#> X pclass survived
#> Min. : 1 Min. :1.000 Min. :0.000
#> 1st Qu.: 328 1st Qu.:2.000 1st Qu.:0.000
#> Median : 655 Median :3.000 Median :0.000
#> Mean : 655 Mean :2.295 Mean :0.382
#> 3rd Qu.: 982 3rd Qu.:3.000 3rd Qu.:1.000
#> Max. :1309 Max. :3.000 Max. :1.000
#>
#> name sex age
#> Length:1309 Length:1309 Min. : 0.1667
#> Class :character Class :character 1st Qu.:21.0000
#> Mode :character Mode :character Median :28.0000
#> Mean :29.8811
#> 3rd Qu.:39.0000
#> Max. :80.0000
#> NA's :263
#> sibsp parch fare
#> Min. :0.0000 Min. :0.000 Min. : 0.000
#> 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.: 7.896
#> Median :0.0000 Median :0.000 Median : 14.454
#> Mean :0.4989 Mean :0.385 Mean : 33.295
#> 3rd Qu.:1.0000 3rd Qu.:0.000 3rd Qu.: 31.275
#> Max. :8.0000 Max. :9.000 Max. :512.329
#> NA's :1
This looks pretty good, except survived should be logical or a yes/no character, and the passenger class should be a factor. Let’s change that using the pipe operator.
titanic <- titanic %>%
mutate(survived = ifelse(survived==0, "No","Yes"))%>%
mutate(pclass = as.factor(pclass))
str(titanic)
#> 'data.frame': 1309 obs. of 9 variables:
#> $ X : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ pclass : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
#> $ survived: chr "Yes" "Yes" "No" "No" ...
#> $ name : chr "Allen, Miss. Elisabeth Walton" "Allison, Master. Hudson Trevor" "Allison, Miss. Helen Loraine" "Allison, Mr. Hudson Joshua Creighton" ...
#> $ sex : chr "female" "male" "female" "male" ...
#> $ age : num 29 0.917 2 30 25 ...
#> $ sibsp : int 0 1 1 1 1 0 1 0 2 0 ...
#> $ parch : int 0 2 2 2 2 0 0 0 0 0 ...
#> $ fare : num 211 152 152 152 152 ...
summary(titanic)
#> X pclass survived
#> Min. : 1 1:323 Length:1309
#> 1st Qu.: 328 2:277 Class :character
#> Median : 655 3:709 Mode :character
#> Mean : 655
#> 3rd Qu.: 982
#> Max. :1309
#>
#> name sex age
#> Length:1309 Length:1309 Min. : 0.1667
#> Class :character Class :character 1st Qu.:21.0000
#> Mode :character Mode :character Median :28.0000
#> Mean :29.8811
#> 3rd Qu.:39.0000
#> Max. :80.0000
#> NA's :263
#> sibsp parch fare
#> Min. :0.0000 Min. :0.000 Min. : 0.000
#> 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.: 7.896
#> Median :0.0000 Median :0.000 Median : 14.454
#> Mean :0.4989 Mean :0.385 Mean : 33.295
#> 3rd Qu.:1.0000 3rd Qu.:0.000 3rd Qu.: 31.275
#> Max. :8.0000 Max. :9.000 Max. :512.329
#> NA's :1
8.2 Making a bar graph with ggplot2
You already saw how to use base R to create plots, so we will use ggplot2 this time. There are basically two approaches to getting started with ggplot2. You could use one of the many good guides available online, for example https://ggplot2-book.org. Or you can start with a code chunk and modify it until you are satisfied. In either case, you will most likely reach a point were you need help. There is a very active R community online, for example at https://community.rstudio.com/t/welcome-to-the-rstudio-community/8 or https://stackoverflow.com/search?q= . Post your question to one of the forums, or google the answer. It is perfectly legit to learn from other people instead of re-inventing the wheel. Be aware though that answers are often needlessly complicated (people showing off). And remember, if you are submitting work for credit in this class, cite your source.
Every ggplot has 3 component: the data, to use, a set of aesthetic mappings between variable(s) in the data and - optional - some visual properties, and the geom_
functions describing how to plot the observations. Here our data istitanic
, we want to use the variable ’survived, and we want a bar graph. For geom_bar, the height of a bar is proportional to the number of cases in the corresponding group. Note that you can simply rotate the graph by specifying that the variable to be plotted should correspond to y
, or just by adding +coordflip()
.
ggplot(titanic,aes(x=survived)) +
geom_bar() +
coord_flip()
We can add some color, titles, and labels,and vary the opacity of the fill, the width of the lines, and the width of the bars. A list of R color names can be found here: http://sape.inf.usi.ch/quick-reference/ggplot2/colour
ggplot(titanic,aes(x=survived)) +
geom_bar( color="blue", linewidth = 2, fill="lightgoldenrod1", alpha=0.5, width=1)+
labs(title = "number of people surviving titanic",x="survived?", y="total count")
Here we tweak the code to show the percent of people surviving.
ggplot(titanic,aes(x=survived,y=after_stat(prop), group=1)) +
geom_bar( color="blue", linewidth = 2, fill="lightgoldenrod1", alpha=0.5, width=1)+
labs(title = "percent of people surviving titanic",x="survived?", y="total count")
We can assign colors etc. by group:
8.3 Grouping data by passenger class
First, let’s examine the survival rate by passenger class. We use the pipe operator to group and summarize the data by class. For this code chunk I have inserted comments with the #
, I want you to do that for the rest of the code chunks in this chapter.
set_one <- titanic %>% #states to use the data frame titanic
group_by(pclass)%>% #group the data by passenger class
count(survived) %>% # count the number n of survivors in each group
mutate(percent = n/sum(n)*100) #create a new variable percent and add it to the data frame
set_one #print out set_one
#> # A tibble: 6 × 4
#> # Groups: pclass [3]
#> pclass survived n percent
#> <fct> <chr> <int> <dbl>
#> 1 1 No 123 38.1
#> 2 1 Yes 200 61.9
#> 3 2 No 158 57.0
#> 4 2 Yes 119 43.0
#> 5 3 No 528 74.5
#> 6 3 Yes 181 25.5
Below are point graphs (geom_point) and line graphs (geom_line). You should see that neither is appropriate for this situation. Also note that we can chose the color of a point based on its value. ggplot allows us to easily arrange plots on a grid if we name them (I used p1, p2, p3, p4) and use the package gridExtra
with the function grid.arrange
.
p1 <- ggplot(set_one, aes(x=pclass, y=n, color=survived))+geom_point()+
labs(title="Point plot, number of people surviving", subtitle = "by passenger class", x="passenger class",y="percent surviving")
p2 <- ggplot(set_one, aes(x=pclass, y=percent, color=survived))+geom_point()+
labs(title="Point plot, percent of people surviving", subtitle = "by passenger class", x="passenger class",y="percent surviving")
p3 <- ggplot(set_one, aes(x=as.numeric(pclass), y=n, color=survived))+geom_line()+
labs(title="Line plot, number of people surviving", subtitle = "by passenger class", x="passenger class",y="percent surviving")
p4 <- ggplot(set_one, aes(x=as.numeric(pclass), y=percent, color=survived))+geom_line()+
labs(title="Line plot, percent of people surviving", subtitle = "by passenger class", x="passenger class",y="percent surviving")
library(gridExtra)
#>
#> Attaching package: 'gridExtra'
#> The following object is masked from 'package:dplyr':
#>
#> combine
grid.arrange(p1, p2, p3, p4, nrow=2)
We will show what is going on using bar graphs. It is personal preference whether you prefer the bars stacked or side-by-side, and the shading by passenger class or by survival.
p1 <- ggplot(set_one, aes(x=pclass, y=percent, fill=survived)) +
geom_bar(position="dodge", stat="identity") +
labs(title = "percent of people surviving",subtitle = "by passenger class")
p2 <- ggplot(set_one, aes(x=survived, y=n, fill=pclass)) +
geom_bar(position="dodge", stat="identity")+
labs(title = "number of people surviving",subtitle = "by passenger class")
p3 <-ggplot(set_one, aes(x=pclass, y=percent, fill=survived)) +
geom_bar(position="stack", stat="identity") +
labs(title = "percent of people surviving ",subtitle = "by passenger class")+
scale_fill_manual(values = c("black","pink"))
p4 <-ggplot(set_one, aes(x=survived, y=n, fill=pclass)) +
geom_bar(position="stack", stat="identity")+
labs(title = "number of people surviving ",subtitle = "by passenger class")+
scale_fill_manual(values = c("black","grey","white"))
grid.arrange(p1, p2, p3, p4, nrow=2)
titanic <- read.csv("titanic_333.csv")
titanic <- titanic %>%
mutate(survived = ifelse(survived=="1", "alive","dead"))
set_one <- titanic %>% #states to use the data frame titanic
group_by(pclass)%>% #group the data by passenger class
count(survived) %>% # count the number n of survivors in each group
mutate(percent = n/sum(n)*100) #create a new variable percent and add it to the data frame
set_one #print out set_one
#> # A tibble: 6 × 4
#> # Groups: pclass [3]
#> pclass survived n percent
#> <int> <chr> <int> <dbl>
#> 1 1 alive 200 61.9
#> 2 1 dead 123 38.1
#> 3 2 alive 119 43.0
#> 4 2 dead 158 57.0
#> 5 3 alive 181 25.5
#> 6 3 dead 528 74.5
ggplot(set_one, aes(x=pclass, y=percent, fill=survived)) +
geom_bar(position="dodge", stat="identity") +
labs(title = "percent of people surviving",subtitle = "by passenger class")
Assignment 1
What is the best graph for this situation and why?
What graph is totally unsuited?
What does the data tell you about survival?
8.4 Your turn
Now that you have the basics, you can work through the rest of the chapter yourself. Make sure you work all the assignments.
Assignment 2 Insert comments into the code chunk below.
set_two <- titanic %>%
group_by(sex)%>%
count(survived) %>%
mutate(percent = n/sum(n)*100)
set_two
#> # A tibble: 4 × 4
#> # Groups: sex [2]
#> sex survived n percent
#> <chr> <chr> <int> <dbl>
#> 1 female alive 339 72.7
#> 2 female dead 127 27.3
#> 3 male alive 161 19.1
#> 4 male dead 682 80.9
Assignment 3 You pick and code a graph that you think looks good.Or if you prefer, you can start with this one and pretty it up with labels etc. What does the data tell you about survival?
Assignment 4 Insert comments into the code chunk below.
titanic <- read.csv("titanic_333.csv")
titanic <- titanic %>%
mutate(survived = ifelse(survived==0, "No","Yes"))%>%
mutate(pclass = as.factor(pclass))
str(titanic)
#> 'data.frame': 1309 obs. of 9 variables:
#> $ X : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ pclass : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
#> $ survived: chr "Yes" "Yes" "No" "No" ...
#> $ name : chr "Allen, Miss. Elisabeth Walton" "Allison, Master. Hudson Trevor" "Allison, Miss. Helen Loraine" "Allison, Mr. Hudson Joshua Creighton" ...
#> $ sex : chr "female" "male" "female" "male" ...
#> $ age : num 29 0.917 2 30 25 ...
#> $ sibsp : int 0 1 1 1 1 0 1 0 2 0 ...
#> $ parch : int 0 2 2 2 2 0 0 0 0 0 ...
#> $ fare : num 211 152 152 152 152 ...
no_age <- which(is.na(titanic$age))
titanic <- titanic[-no_age,]
summary(titanic)
#> X pclass survived
#> Min. : 1.0 1:284 Length:1046
#> 1st Qu.: 299.2 2:261 Class :character
#> Median : 575.5 3:501 Mode :character
#> Mean : 600.2
#> 3rd Qu.: 875.5
#> Max. :1309.0
#>
#> name sex age
#> Length:1046 Length:1046 Min. : 0.1667
#> Class :character Class :character 1st Qu.:21.0000
#> Mode :character Mode :character Median :28.0000
#> Mean :29.8811
#> 3rd Qu.:39.0000
#> Max. :80.0000
#>
#> sibsp parch fare
#> Min. :0.0000 Min. :0.0000 Min. : 0.00
#> 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 8.05
#> Median :0.0000 Median :0.0000 Median : 15.75
#> Mean :0.5029 Mean :0.4207 Mean : 36.69
#> 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 35.50
#> Max. :8.0000 Max. :6.0000 Max. :512.33
#> NA's :1
set_three <- titanic %>%
mutate(child = age < 16) %>%
group_by(child)%>%
count(survived) %>%
mutate(percent = n/sum(n)*100)
set_three
#> # A tibble: 4 × 4
#> # Groups: child [2]
#> child survived n percent
#> <lgl> <chr> <int> <dbl>
#> 1 FALSE No 570 61.2
#> 2 FALSE Yes 361 38.8
#> 3 TRUE No 49 42.6
#> 4 TRUE Yes 66 57.4
Assignment 5 Pick and code a graph that you think looks good. What does the data tell you about survival?
Assignment 6 Insert comments into the code chunk below.
set_four <- titanic %>%
mutate(senior = ifelse( age >= 60, "senior","")) %>%
group_by(senior) %>%
count(survived) %>%
mutate(percent = n/sum(n)*100)
set_four
#> # A tibble: 4 × 4
#> # Groups: senior [2]
#> senior survived n percent
#> <chr> <chr> <int> <dbl>
#> 1 "" No 591 58.7
#> 2 "" Yes 415 41.3
#> 3 "senior" No 28 70
#> 4 "senior" Yes 12 30
Assignment 7 Pick and code a graph that you think looks good. What does the data tell you about survival?
8.5 Now let’s combine all of the above
Assignment 8 Insert comments into the code chunk below.
n_obs <- nrow(titanic)
age_group <- rep("adult",n_obs)
for (i in 1:n_obs) {
age_group[i] <- if (titanic$age[i]<16) "child" else if (titanic$age[i]>60) "senior" else "adult"
}
titanic <- data.frame(age_group,titanic)
set_five <- titanic %>%
# mutate(child = age < 16,senior = age >= 60) %>%
mutate(group_name = paste("class", as.character(pclass),",", sex,",", age_group))%>%
group_by(pclass, sex, age_group, group_name)%>%
count(survived) %>%
mutate(percent= n/sum(n)*100)
set_five<- set_five %>%
arrange(desc(n))
set_five
#> # A tibble: 30 × 7
#> # Groups: pclass, sex, age_group, group_name [17]
#> pclass sex age_group group_name survived n percent
#> <fct> <chr> <chr> <chr> <chr> <int> <dbl>
#> 1 3 male adult class 3 ,… No 256 84.8
#> 2 2 male adult class 2 ,… No 129 92.1
#> 3 1 female adult class 1 ,… Yes 121 97.6
#> 4 1 male adult class 1 ,… No 84 64.1
#> 5 2 female adult class 2 ,… Yes 76 87.4
#> 6 3 female adult class 3 ,… No 62 54.4
#> 7 3 female adult class 3 ,… Yes 52 45.6
#> 8 1 male adult class 1 ,… Yes 47 35.9
#> 9 3 male adult class 3 ,… Yes 46 15.2
#> 10 3 male child class 3 ,… No 29 69.0
#> # ℹ 20 more rows
ggplot(set_five, aes(x=group_name, y=n, fill=survived)) +
geom_bar(position="dodge", stat="identity") +
labs(title = "Number of people surviving Titanic", x="Group",y="Total survivors in group")+
scale_fill_manual(values = c("red","green"))+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
ggplot(data=set_five, aes(x=reorder(group_name,n,decreasing = TRUE),y=n))+
geom_bar(stat="identity", fill="blue") +
labs(title = "Number of people surviving titanic",x="Group",y="Total survivors in group")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
set_five<- set_five %>%
arrange(percent)
set_five
#> # A tibble: 30 × 7
#> # Groups: pclass, sex, age_group, group_name [17]
#> pclass sex age_group group_name survived n percent
#> <fct> <chr> <chr> <chr> <chr> <int> <dbl>
#> 1 1 female adult class 1 ,… No 3 2.42
#> 2 1 male senior class 1 ,… Yes 1 6.67
#> 3 2 male adult class 2 ,… Yes 11 7.86
#> 4 2 male child class 2 ,… No 1 8.33
#> 5 2 female adult class 2 ,… No 11 12.6
#> 6 3 male adult class 3 ,… Yes 46 15.2
#> 7 1 female senior class 1 ,… No 1 16.7
#> 8 2 male senior class 2 ,… Yes 1 16.7
#> 9 3 male child class 3 ,… Yes 13 31.0
#> 10 1 female child class 1 ,… No 1 33.3
#> # ℹ 20 more rows
ggplot(set_five, aes(x=group_name, y=percent, fill=survived)) +
geom_bar(position=position_dodge(preserve="single"), stat="identity",width=0.7) +
labs(title = "Percent of people surviving Titanic",x="Group",y="Percent survivors in group")+
scale_fill_manual(values = c("red","green"))+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
keep <- which(set_five$survived=="Yes")
set_five_survived <- set_five[keep,]
set_five_survived
#> # A tibble: 16 × 7
#> # Groups: pclass, sex, age_group, group_name [16]
#> pclass sex age_group group_name survived n percent
#> <fct> <chr> <chr> <chr> <chr> <int> <dbl>
#> 1 1 male senior class 1 ,… Yes 1 6.67
#> 2 2 male adult class 2 ,… Yes 11 7.86
#> 3 3 male adult class 3 ,… Yes 46 15.2
#> 4 2 male senior class 2 ,… Yes 1 16.7
#> 5 3 male child class 3 ,… Yes 13 31.0
#> 6 1 male adult class 1 ,… Yes 47 35.9
#> 7 3 female adult class 3 ,… Yes 52 45.6
#> 8 3 female child class 3 ,… Yes 19 51.4
#> 9 1 female child class 1 ,… Yes 2 66.7
#> 10 1 female senior class 1 ,… Yes 5 83.3
#> 11 2 female adult class 2 ,… Yes 76 87.4
#> 12 2 male child class 2 ,… Yes 11 91.7
#> 13 1 female adult class 1 ,… Yes 121 97.6
#> 14 2 female child class 2 ,… Yes 16 100
#> 15 1 male child class 1 ,… Yes 5 100
#> 16 3 female senior class 3 ,… Yes 1 100
ggplot(data=set_five_survived, aes(x=reorder(group_name,percent,decreasing = TRUE),y=percent))+
geom_bar(stat="identity", fill="lightblue") +
labs(title = "Percent of people surviving Titanic",x="Group",y="Percent survivors in group")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Assignment 9 Have a closer look at the three graphs below. Pick the one you like best, and and change the colors to something better.
ggplot(set_five, aes(x=group_name, y=percent, fill=survived, label = paste(round(percent),"%,", n))) +
geom_bar(position="stack", stat="identity") +
labs(title = "percent of people surviving titanic",x="",y="percent")+
scale_fill_manual(values = c("black","Pink"))+
geom_text(size = 3, position = position_stack(vjust = 0.5), color="white")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
ggplot(set_five, aes(x=group_name, y=percent, fill=survived, label = paste(round(percent),"%,", n))) +
geom_bar(position="stack", stat="identity") +
labs(title = "percent of people surviving titanic",x="",y="Percent")+
scale_fill_manual(values = c("black","Pink"))+
geom_text(size = 3, position = position_stack(vjust = 0.5), color="magenta")+
theme(axis.text.x = element_text( vjust = 1))+coord_flip()
ggplot(set_five, aes(x=group_name, y=percent, fill=survived, label = paste(round(percent),"%,", n))) +
geom_bar(position="stack", stat="identity") +
labs(title = "percent of people surviving titanic",x="",y="Percent")+
scale_fill_manual(values = c("black","Pink"))+
geom_text(size = 3, position = position_stack(vjust = 0.5), color="magenta")+
theme(axis.text.x = element_text( vjust = 1), axis.text.y=element_text(hjust=0))+coord_flip()
Assignment 10 Fix the graph below. It should look roughly like this:
ggplot(set_five, aes(x=group_name, y=percent, fill=survived, label = paste("this is weird",round(percent),"%,", n))) +
geom_bar(position="stack", stat="identity",alpha=0.1) +
labs(title = "Bad graph,percent of people surviving titanic",x="Should this be here?",y="Percent")+
scale_fill_manual(values = c("darkgreen","darkgreen"))+
geom_text(size = 3, position = position_stack(vjust = 0.5), color="white")+
theme(axis.text.x = element_text(vjust = -4, hjust=12, color="Magenta"),
axis.text.y=element_text(hjust=2,face="bold.italic",color="blue",size=16))+
coord_flip(xlim=c(0,2000), ylim=c(0,10))
Assignment 11
What is your final answer, what does the data tell you about survival?
Discuss the reliability of your answer. You can find more info on the titanic here: https://www.historyonthenet.com/how-many-people-were-on-the-titanic.