12.6 Bagging

Single decision trees may suffer from high variance, i.e., results could be very different for different splits
- Why?
We want: Low variance procedure would yield similar results if applied repeatedly to distinct datasets (e.g., linear regression)
Bagging: construct $B$ regression trees using $B$ bootstrapped training sets, and average the resulting predictions
- Bootstrap (also called bootstrap aggregation): Take repeated samples from the (single) training set, built predictive model and average predictions
- Classification: For a given test observation, we can record the class predicted by each of the $B$ trees, and take a majority vote: the overall prediction is the most commonly occurring majority vote class among the $B$ predictions