Chapter 25 SVM

Let us now proceed to our second algorithm, support vector machines (SVM). SVM is an algorithm that is especially good for binary variables. It classifies binary variables by plotting the data points in an “n-th dimensional space”, where n is the number of features (in this case, at least a thousand words), and then identifying a “hyper-plane” that divides the observations into two spaces. SVM is considered a “large margin classifier” because it tries to identify the “largest margin” between the binary variables in this dimensional space. Learn more about large margin classifiers here.

SVM is very popular in supervised machine learning because it is well-equipped for highly dimensional data (i.e., when you have a lot of features in your data, like in natural language processing) and handling “close” cases (data points that could be classified as either 1 or 0). To handle these close cases, we modify the algorithm using two hyperparameters: cost, which is used to account for overfitting, and Loss,which penalizes for values that would be mis-classified. Learn more about cost here and (hinge) Loss here.

In R, a couple different packages have a svm() function. We’ll use the one from the LiblineaR package.

Like the kNN algorithm, we will use train() in caret to construct our model using (1) the data (x and y), (2) the algorithm (method == "svmLinear3"), and (3) the hyperparameters (for this algorithm, cost and Loss). Let’s apply the algorithm now. Note that not much changes, aside from the method and the tuneGrid arguments.

svm_model <- caret::train(x = tw_to_train,
                 y = as.factor(conservative_code),
                 method = "svmLinear3",
                 trControl = trctrl, 
                 tuneGrid = data.frame(cost = 1, #accounts for over-fitting
                                       Loss = 2)) #accounts for misclassifications

print(svm_model)
## L2 Regularized Support Vector Machine (dual) with Linear Kernel 
## 
##   92 samples
## 1035 predictors
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 92, 92, 92, 92, 92, 92, ... 
## Resampling results:
## 
##   Accuracy   Kappa      
##   0.4761856  -0.02138603
## 
## Tuning parameter 'cost' was held constant at a value of 1
## Tuning
##  parameter 'Loss' was held constant at a value of 2

Now let’s apply this to the test data.

25.1 Testing the Model

Once again, we’ll use predict() to apply the model.

svm_predict <- predict(svm_model, newdata = tw_to_test)

And next, we’ll use confusionMatrix().

svm_confusion_matrix <- caret::confusionMatrix(svm_predict, conservative_data$conservative[-trainIndex], mode = "prec_recall")
svm_confusion_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0  9  7
##          1  8 14
##                                           
##                Accuracy : 0.6053          
##                  95% CI : (0.4339, 0.7596)
##     No Information Rate : 0.5526          
##     P-Value [Acc > NIR] : 0.3142          
##                                           
##                   Kappa : 0.1972          
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##               Precision : 0.5625          
##                  Recall : 0.5294          
##                      F1 : 0.5455          
##              Prevalence : 0.4474          
##          Detection Rate : 0.2368          
##    Detection Prevalence : 0.4211          
##       Balanced Accuracy : 0.5980          
##                                           
##        'Positive' Class : 0               
## 

Although the accuracy of this algorithm is pretty similar to the kNN, the F-1 score is better (0.37 compared to 0.19), because the recall of this model is twice as good as the recall in the previous model (even though the precision is about the same).

Want to learn more about SVM? Checkout these guides here:
* Towards Data Science analysis of SVM and KNN (among others)
* MonkeyLearn Explanation of hyperplanes.
* StatQuest Video on SVM. * SVM in e1071, another package with svm()