9.1 Retake: Simple setup to build predictive model

  • A simple setup to built a predictive model might look as follows:
  1. Randomly split data into one training dataset and one validation dataset
  2. Train model based on training data
  3. Predict outcome in training data and calculate training error rate
  4. If unhappy, change model (e.g. select more features) and redo (3)
  5. If happy, use trained model to predict outcome in validation dataset and calculate test error rate
  • Model tuning
    • e.g., parameter tuning, feature selection, up-/down-sampling of imbalanced data prior to training
  • Sometimes we might to want to use different datasets for model tuning vs. calculating true/test error rate
    • …split validation dataset into one used for tuning (often still called validation dataset) and test dataset
    • Training, validation and test dataset (see here vs. James et al. (2013))

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.