9.1 Retake: Simple setup to build predictive model | Computational Social Science

9.1 Retake: Simple setup to build predictive model

A simple setup to built a predictive model might look as follows:

Randomly split data into one training dataset and one validation dataset
Train model based on training data
Predict outcome in training data and calculate training error rate
If unhappy, change model (e.g. select more features) and redo (3)
If happy, use trained model to predict outcome in validation dataset and calculate test error rate

Model tuning
- e.g., parameter tuning, feature selection, up-/down-sampling of imbalanced data prior to training
Sometimes we might to want to use different datasets for model tuning vs. calculating true/test error rate
- …split validation dataset into one used for tuning (often still called validation dataset) and test dataset
- Training, validation and test dataset (see here vs. James et al. (2013))

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.