7.15 Overview of Classification

  • Classification problems occur often, perhaps even more so than regression problems, e.g., :
    1. A person arrives at the emergency room with a set of symptoms that could possibly be attributed to one of three medical conditions. Which of the three conditions does the individual have?
    2. An online banking service must be able to determine whether or nota transaction being performed on the site is fraudulent, on the basis of the user’s IP address, past transaction history, and so forth.
    3. On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not.
  • If we have a set of training observations (\(x_{1},y_{1}\)),…,(\(x_{n},y_{n}\)), we build a classifier
  • Why not linear regression?
    • No natural way to convert qualitative response variable with more than two levels into a quantitative response for LM [Linear probability model for binary outcome = possible but predictions can be outside [0,1] interval]
  • Source: James et al. (2013, chaps. 4.1, 4.2)


James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.