12.7 Out-of-Bag (OOB) Error Estimation

  • OBB Error estimation: Straightforward way to estimate test error of bagged model (no need for cross-validation)
    • On average each bagged tree makes use of around two-thirds of the observations
    • remaining one-third of observations not used to fit a given bagged tree are referred to as the out-of-bag (OOB) observations
  • Predict outcome for the \(i\)th observation using each of the trees in which that observation was OOB
    • Will yield around \(B/3\) predictions for the \(i\)th observation
    • Then average these predicted responses (if regression is the goal) or take a majority vote (if classification is the goal)
      • Leads to a single OOB prediction for the \(i\)th observation
      • OOB prediction can be obtained in this way for each of the \(n\) observations, from which the overall OOB classification error can be computed (Regression: OOB MSE)
  • The resulting OOB error is a valid estimate of the test error for the bagged model
    • Because outcome/response for each observation is predicted using only the trees that were not fit using that observation
  • It can be shown that with \(B\) sufficiently large, OOB error is virtually equivalent to leave-one-out cross-validation error.
  • OOB test error approach particularly convenient for large data sets for which cross-validation would be computationally onerous