9.7 Resampling methods (6): k-Fold Cross-Validation

  • James et al. (2013) use Figure 5.5 [p.179] to explain the k-Fold Cross-Validation. Please inspect the figure and explain it in a few words.





  • k-Fold Cross-Validation is special case of LOOCV where \(k\) is set to equal \(n\)

  • Error rate: \(CV_{(k)} = \frac{1}{k}\sum_{i=1}^{k}Err_{i}\)

  • Advantages

    • Q: What is the advantage of using \(k = 5\) or \(k = 10\) rather than \(k = n\)? [computation!]
  • Bias-variance trade-off (James et al. 2013, Ch. 5.1.4)

    • Q: k-Fold CV may give more accurate estimates of test error rate than LOOCV: Why?
      • Datasets smaller than LOOCV, but larger than validation set approach
        • LOOCV best from bias perspective but (most observations) but each model fit on identical observations (highly correlated)
          • Mean of many highly correlated quantities has higher variance than does the mean of many quantities that are notas highly correlated → test error estimate from LOOCV higher variance than test error estimate from k-fold CV
    • Recommendation use k-fold cross-validation using k = 5 or k = 10

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.