Exercises

  1. Visualise prediction results is a useful way to find the problem. using ‘Rtsne’ package from R to visualise decision tree model2 both the left branch and the right branch’s prediction, compare them.
features <- c("Sex", "Fare_pp", "Pclass", "Title", "Age_group", "Group_size", "Ticket_class", "Embarked")

Tree.left <- train[train$Title == "Mr",]

set.seed(984357)

tsne.left <- Rtsne(Tree.left[, features], check_duplicates = FALSE)

ggplot(NULL, aes(x = tsne.left$Y[, 1], y = tsne.left$Y[, 2],
                 color = Tree.left$Survived)) +
  geom_point() +
  labs(color = "Survived") +
  ggtitle("Visualization of left branch of tree where title is 'Mr'")

#
Tree.right <- train[train$Title != "Mr",]

set.seed(984357)
tsne.right <- Rtsne(Tree.right[, features], check_duplicates = FALSE)
ggplot(NULL, aes(x = tsne.right$Y[, 1], y = tsne.right$Y[, 2],
                 color = Tree.right$Survived)) +
  geom_point() +
  labs(color = "Survived") +
  ggtitle("Visualization of right branch of the tree")
  1. Considering re-engineer passengers with the same tickets.