9.7 Resampling methods (6): k-Fold Cross-Validation

  • James et al. (2013) use Figure 5.5 [p.179] to explain the k-Fold Cross-Validation. Please inspect the figure and explain it in a few words.





  • k-Fold Cross-Validation is special case of LOOCV where k is set to equal n

  • Error rate: CV(k)=1kki=1Erri

  • Advantages

    • Q: What is the advantage of using k=5 or k=10 rather than k=n? [computation!]
  • Bias-variance trade-off (James et al. 2013, Ch. 5.1.4)

    • Q: k-Fold CV may give more accurate estimates of test error rate than LOOCV: Why?
      • Datasets smaller than LOOCV, but larger than validation set approach
        • LOOCV best from bias perspective but (most observations) but each model fit on identical observations (highly correlated)
          • Mean of many highly correlated quantities has higher variance than does the mean of many quantities that are notas highly correlated → test error estimate from LOOCV higher variance than test error estimate from k-fold CV
    • Recommendation use k-fold cross-validation using k = 5 or k = 10

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.