12.9 Random forests

  • Random Forests (RFs) provide improvement over bagged trees
    • Decorrelated trees lead to reduction in both test and OOB error over bagging
  • RFs also build decision trees on bootstrapped training samples but…
  • ….add predictor subsetting:
    • Each time a split in a tree is considered,a random sample of \(m\) predictors is chosen as split candidates from the full set of \(p\) predictors
    • Split only allowed to use one of \(m\) predictors
    • Fresh sample of \(m\) predictors taken at each split (typically not all but \(m \approx \sqrt{p}\)!)
      • On average \((p-m)/p\) splits won’t consider strong predictor (decorrelating trees)
    • Objective: Avoid that single strong predictor is always used for first split & decrease correlation of trees
  • Main difference bagging vs. random forests: Choice of predictor subset size \(m\)
    • RF built using \(m=p\) equates Bagging
  • Recommmendation: Vary \(m\) and use small \(m\) when predictors are highly correlated
  • Finally, boosting (skipped here) grows trees sequentially using information from previous trees ee Chapter 8.2.3 (James et al. 2013, 321)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.