Chapter 10 Support Vector Machines

These notes rely on (James et al. 2013), (Hastie, Tibshirani, and Friedman 2017), and (Kuhn and Johnson 2016), PSU STAT 508, and the e1071 SVM vignette.

Support Vector Machines (SVM) is a classification model1 that maps observations as points in space so that the categories are divided by as wide a gap as possible. New observations can then be mapped into that same space for prediction. The SVM algorithm finds the optimal separating hyperplane using a nonlinear mapping to a sufficiently high dimension. The hyperplane is defined by the observations that lie within a margin optimized by a cost hyperparameter. These observations are called the support vectors.

SVM is an extension of the support vector classifier which in turn is a generalization of the simple and intuitive maximal margin classifier. The best way to understand the SVM is to start with the maximal margin classifier and work up.

References

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2017. The Elements of Statistical Learning. 2nd ed. New York, NY: Springer. https://web.stanford.edu/~hastie/ElemStatLearn/.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. 1st ed. New York, NY: Springer. http://faculty.marshall.usc.edu/gareth-james/ISL/book.html.

Kuhn, Max, and Kjell Johnson. 2016. Applied Predictive Modeling. 1st ed. New York, NY: Springer. http://appliedpredictivemodeling.com/.


  1. Other classification models include LDA, logistic regression, and tree-based models such as bagging, random forests, and gradient boosting.↩︎