Chapter 12 M12: Trees
The final module in our journey takes us (as many adventures do) to a mystical forest. Specifically, we’re looking at tree-based methods, both for regression (quantitative response) and classification (categorical response). This also involves some thinking about what it means to be doing classification well, with measures like the Gini index. Finally, we get an introduction to ensemble methods: the idea of building many different models and combining the predictions from each one to get an overall prediction.
This module’s reading is all in the textbook! You can see notes about this reading in the pre-class assignments for Days 32-34 on Moodle. Relevant sections include:
- Chapter 8 introduction “Tree-Based Methods”
- Section 8.1 “The Basics of Decision Trees”
- We originally read Subsections 8.1.3 “Trees Versus Linear Models” and 8.1.4 “Advantages and Disadvantages of Trees” first, then Subsection 8.1.2 “Classification Trees” for the next day.
- Section 8.2 “Bagging, Random Forests, Boosting, and Bayesian Additive Regression Trees”
- Feel free to skim or skip subsection 8.2.4 “Bayesian Additive Regression Trees” if you like.
- Do read subsection 8.2.5 “Summary of Tree Ensemble Methods,” even if you skip 8.2.4.