12.3 Tree-based methods

  • See James et al. (2013, chap. 8)
  • “involve stratifying or segmenting the predictor space into a number of simple regions” (James et al. 2013, 303)
    • Region: e.g., age (<25) and previous offenses (<1) → 4 regions/data subsets
    • See James et al. (2013, 313), Figure 8.6.
      • Outcome: Heart disease (Yes, No)
  • Prediction for observation i
    • Based on mean or mode of training observations in the region to which it belongs
  • Decision tree methods: Name originates from splitting rules that can be summarized in a tree
    • Can be applied to both regression and classification problems
    • Predictive power of single decision tree not very good
  • Bagging, random forests,and boosting
    • Approaches involve producing multiple trees which are then combined to yield a single consensus prediction (“average prediction”)
    • Combination of large number of trees can dramatically increase predictive power, but interpretation is more difficult (black box! Why?!)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.