12.4 Classification trees

  • Predict qualitative outcomes rather than a quantitative one (vs. regression trees)
  • Prediction: Unseen (test) observation i belongs to the most commonly occurring class of training observations (the mode) in the region to which it belongs
    • Region: Young people (<25) with 3 previous offences
    • Training observations: Most individuals in region re-offended
    • Unseen/test observations: Prediction.. also re-offended
  • To grow classification tree we use recursive splitting/partitioning
    • Splitting training data into sub-populations based on several dichotomous independent variables
    • Criterion for making binary splits: Classification error rate (vs. RSS in regression tree)
      • minimize CRR: Fraction of training observations in region that do not belong to most common class in region
      • E=1max where \hat{p}_{mk} is proportion of training observations in the mth region that are from the kth class
      • In practice: Use measures more sensitive to node purity, i.e., Gini index or cross-entropy (cf. James et al. 2013, 312)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.