6.24 Resampling methods (3): Validation set approach

  • Validation set approach (see Fig. 5.1., James et al. 2013, Ch. 5.1.1)
    • Q: James et al. (2013) use Figure 5.1 [p.177] to explain the validation set approach. Please inspect the figure and explain it in a few words.



  • Involves randomly dividing the original set of observations into two parts, a training set and a validation set/hold-out set (e.g., 50%/50%)
    • Model fit on training set and used to predict responses in validation set
    • Important: We don’t have new data, we just use a part of the original data!
    • Validation set error rate = estimate of test error set
    • We did this in Lab: Predicting recidvism (Classification)


  • Q: What could be disadvantages of this approach? (think of data set sizes/splitting!)

  • Disadvantages

    • Estimate of test error rate can be highly variable, depending on which observations included in training set/validation set
    • Only subset of observations in training set used to fit model.. but stat. methods perform worse when trained on fewer observations..
      • Validation set error rate may overestimate test error rate for the model fit on the entire data set

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.