12.4 Classification trees
- Predict qualitative outcomes rather than a quantitative one (vs. regression trees)
- Prediction: Unseen (test) observation i belongs to the most commonly occurring class of training observations (the mode) in the region to which it belongs
- Region: Young people (<25) with 3 previous offences
- Training observations: Most individuals in region re-offended
- Unseen/test observations: Prediction.. also re-offended
- To grow classification tree we use recursive splitting/partitioning
- Splitting training data into sub-populations based on several dichotomous independent variables
- Criterion for making binary splits: Classification error rate (vs. RSS in regression tree)
- minimize CRR: Fraction of training observations in region that do not belong to most common class in region
- E=1−max where \hat{p}_{mk} is proportion of training observations in the mth region that are from the kth class
- In practice: Use measures more sensitive to node purity, i.e., Gini index or cross-entropy (cf. James et al. 2013, 312)
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.