Out-of-Bag (OOB) Error Estimation
- OBB Error estimation: Straightforward way to estimate test error of bagged model (no need for cross-validation)
- On average each bagged tree makes use of around two-thirds of the observations
- remaining one-third of observations not used to fit a given bagged tree are referred to as the out-of-bag (OOB) observations
- Predict outcome for the \(i\)th observation using each of the trees in which that observation was OOB
- Will yield around \(B/3\) predictions for the \(i\)th observation
- Then average these predicted responses (if regression is the goal) or take a majority vote (if classification is the goal)
- Leads to a single OOB prediction for the \(i\)th observation
- OOB prediction can be obtained in this way for each of the \(n\) observations, from which the overall OOB classification error can be computed (Regression: OOB MSE)
- The resulting OOB error is a valid estimate of the test error for the bagged model
- Because outcome/response for each observation is predicted using only the trees that were not fit using that observation
- It can be shown that with \(B\) sufficiently large, OOB error is virtually equivalent to leave-one-out cross-validation error.
- OOB test error approach particularly convenient for large data sets for which cross-validation would be computationally onerous