Chapter 11 M11: Classification

In this module, we look at some different techniques for handling datasets with a categorical response. We’ve previously used logistic regression for this, but basic logistic regression can only handle binary responses, with two levels. These methods can be used for responses with any number of levels! We think of these as classes – our job is to work out how likely a point is to be in each class, and to give an overall prediction for what class it came from.

Another interesting thing about this module is that we look at both parametric methods (with lots of distributional assumptions) and a nonparametric approach (which makes very few assumptions at all). Most classical stats methods are parametric, but there are nonparametric methods for many different kinds of problems, so it’s nice to see an example here!

This module’s reading is all in the textbook! You can see notes about this reading in the pre-class assignments for Days 29-31 on Moodle. Relevant sections include:

  • Subsection 2.1.2 “How do we estimate f?” from section 2.1 “What Is Statistical Learning?”
  • Subsection 2.1.5 “Regression Versus Classification Problems”
  • Subsection 2.2.3 “The Classification Setting”
  • Section 4.4 introduction “Generative Models for Classification”
  • Subsection 4.4.1 “Linear Discriminant Analysis for p=1”
  • Subsection 4.4.2 “Linear Discriminant Analysis for p>1”
  • Subsection 4.4.3 “Quadratic Discriminant Analysis”