12.4 Classification trees

  • Predict qualitative outcomes rather than a quantitative one (vs. regression trees)
  • Prediction: Unseen (test) observation i belongs to the most commonly occurring class of training observations (the mode) in the region to which it belongs
    • Region: Young people (<25) with 3 previous offences
    • Training observations: Most individuals in region re-offended
    • Unseen/test observations: Prediction.. also re-offended
  • To grow classification tree we use recursive splitting/partitioning
    • Splitting training data into sub-populations based on several dichotomous independent variables
    • Criterion for making binary splits: Classification error rate (vs. RSS in regression tree)
      • minimize CRR: Fraction of training observations in region that do not belong to most common class in region
      • \(E=1−\max\limits_{k}(\hat{p}_{mk})\) where \(\hat{p}_{mk}\) is proportion of training observations in the \(m\)th region that are from the \(k\)th class
      • In practice: Use measures more sensitive to node purity, i.e., Gini index or cross-entropy (cf. James et al. 2013, 312)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.