12.4 Classification trees

Predict qualitative outcomes rather than a quantitative one (vs. regression trees)
Prediction: Unseen (test) observation i belongs to the most commonly occurring class of training observations (the mode) in the region to which it belongs
- Region: Young people (<25) with 3 previous offences
- Training observations: Most individuals in region re-offended
- Unseen/test observations: Prediction.. also re-offended
To grow classification tree we use recursive splitting/partitioning
- Splitting training data into sub-populations based on several dichotomous independent variables
- Criterion for making binary splits: Classification error rate (vs. RSS in regression tree)
  - minimize CRR: Fraction of training observations in region that do not belong to most common class in region
  - \(E=1−\max\limits_{k}(\hat{p}_{mk})\) where \(\hat{p}_{mk}\) is proportion of training observations in the \(m\)th region that are from the \(k\)th class
  - In practice: Use measures more sensitive to node purity, i.e., Gini index or cross-entropy (cf. James et al. 2013, 312)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.