Exercises

  1. Prove there is overfitting with model3 and model4. The idea is to take certain samples from the training dataset and form a validate dataset. Remove the value of attribute survived from the validate dataset. Using models to a prediction on validate dataset. Compare their results. If model3 performs better than model2 on the validate dataset but not on the test dataset (which is given by Kaggle), it shows the model has an overfitting problem. Prove this exists on both model3 and model4.

  2. Build a decision tree model using different predictors.

  3. Investigate the predictor’s number and the same number but different predictors to find the highest prediction accuracy on validation dataset.