12.8 Variable Importance Measures

  • Variable importance: Which are the most important predictors?
  • Single decision tree: Intepretation is easy.. just look at the splits in the graph
  • Bag of large number of trees: Can’t just visualize single tree and no longer clear which variables are most relevant for splits
  • Overall summary of importance of each predictor
    • Using Gini index (measure of node purity) for bagging classification trees (or RSS for regression trees)
      • Classification trees: Add up the total amount that the Gini index is decreased (i.e., node purity increased) by splits over a given predictor, averaged over all \(B\) trees
        • Gini index: a small value indicates that a node contains predominantly observations from a single class
      • See Figure 8.9 (James et al. 2013, 313) for graphical representation of importance: Mean decrease in Gini index for each variable relative to the largest

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.