9.7 Resampling methods (6): k-Fold Cross-Validation
- James et al. (2013) use Figure 5.5 [p.179] to explain the k-Fold Cross-Validation. Please inspect the figure and explain it in a few words.
k-Fold Cross-Validation is special case of LOOCV where \(k\) is set to equal \(n\)
Error rate: \(CV_{(k)} = \frac{1}{k}\sum_{i=1}^{k}Err_{i}\)
Advantages
- Q: What is the advantage of using \(k = 5\) or \(k = 10\) rather than \(k = n\)? [computation!]
Bias-variance trade-off (James et al. 2013, Ch. 5.1.4)
- Q: k-Fold CV may give more accurate estimates of test error rate than LOOCV: Why?
- Datasets smaller than LOOCV, but larger than validation set approach
- LOOCV best from bias perspective but (most observations) but each model fit on identical observations (highly correlated)
- Mean of many highly correlated quantities has higher variance than does the mean of many quantities that are notas highly correlated → test error estimate from LOOCV higher variance than test error estimate from k-fold CV
- LOOCV best from bias perspective but (most observations) but each model fit on identical observations (highly correlated)
- Datasets smaller than LOOCV, but larger than validation set approach
- Recommendation use k-fold cross-validation using k = 5 or k = 10
- Q: k-Fold CV may give more accurate estimates of test error rate than LOOCV: Why?
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.