12.7 Out-of-Bag (OOB) Error Estimation

OBB Error estimation: Straightforward way to estimate test error of bagged model (no need for cross-validation)
- On average each bagged tree makes use of around two-thirds of the observations
- remaining one-third of observations not used to fit a given bagged tree are referred to as the out-of-bag (OOB) observations
Predict outcome for the $i$ th observation using each of the trees in which that observation was OOB
- Will yield around $B/3$ predictions for the $i$ th observation
- Then average these predicted responses (if regression is the goal) or take a majority vote (if classification is the goal)
  - Leads to a single OOB prediction for the $i$ th observation
  - OOB prediction can be obtained in this way for each of the $n$ observations, from which the overall OOB classification error can be computed (Regression: OOB MSE)
The resulting OOB error is a valid estimate of the test error for the bagged model
- Because outcome/response for each observation is predicted using only the trees that were not fit using that observation
It can be shown that with $B$ sufficiently large, OOB error is virtually equivalent to leave-one-out cross-validation error.
OOB test error approach particularly convenient for large data sets for which cross-validation would be computationally onerous