12.8 Variable Importance Measures

• Variable importance: Which are the most important predictors?
• Single decision tree: Intepretation is easy.. just look at the splits in the graph
• e.g., Figure 8.6. lower right Thalium stress test (Tahl) is most important
• Bag of large number of trees: Can’t just visualize single tree and no longer clear which variables are most relevant for splits
• Bagging improves prediction accuracy at the expense of interpretability
• Overall summary of importance of each predictor
• Using Gini index (measure of node purity) for bagging classification trees (or RSS for regression trees)
• Classification trees: Add up the total amount that the Gini index is decreased (i.e., node purity increased) by splits over a given predictor, averaged over all $$B$$ trees
• Gini index: a small value indicates that a node contains predominantly observations from a single class
• See Figure 8.9 for graphical representation of importance: Mean decrease in Gini index for each variable relative to the largest

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.