12.6 Bagging

  • Single decision trees may suffer from high variance, i.e., results could be very different for different splits
    • Why?
  • We want: Low variance procedure would yield similar results if applied repeatedly to distinct datasets (e.g., linear regression)
  • Bagging: construct \(B\) regression trees using \(B\) bootstrapped training sets, and average the resulting predictions
    • Bootstrap (also called bootstrap aggregation): Take repeated samples from the (single) training set, built predictive model and average predictions
    • Classification: For a given test observation, we can record the class predicted by each of the \(B\) trees, and take a majority vote: the overall prediction is the most commonly occurring majority vote class among the \(B\) predictions