Chapter 25 SVM
Let us now proceed to our second algorithm, support vector machines (SVM). SVM is an algorithm that is especially good for binary variables. It classifies binary variables by plotting the data points in an “n
-th dimensional space”, where n
is the number of features (in this case, at least a thousand words), and then identifying a “hyper-plane” that divides the observations into two spaces. SVM is considered a “large margin classifier” because it tries to identify the “largest margin” between the binary variables in this dimensional space. Learn more about large margin classifiers here.
SVM is very popular in supervised machine learning because it is well-equipped for highly dimensional data (i.e., when you have a lot of features in your data, like in natural language processing) and handling “close” cases (data points that could be classified as either 1
or 0
). To handle these close cases, we modify the algorithm using two hyperparameters: cost
, which is used to account for overfitting, and Loss
,which penalizes for values that would be mis-classified. Learn more about cost here and (hinge) Loss here.
In R, a couple different packages have a svm()
function. We’ll use the one from the LiblineaR
package.
Like the kNN
algorithm, we will use train()
in caret
to construct our model using (1) the data (x
and y
), (2) the algorithm (method == "svmLinear3"
), and (3) the hyperparameters (for this algorithm, cost and Loss). Let’s apply the algorithm now. Note that not much changes, aside from the method
and the tuneGrid
arguments.
<- caret::train(x = tw_to_train,
svm_model y = as.factor(conservative_code),
method = "svmLinear3",
trControl = trctrl,
tuneGrid = data.frame(cost = 1, #accounts for over-fitting
Loss = 2)) #accounts for misclassifications
print(svm_model)
## L2 Regularized Support Vector Machine (dual) with Linear Kernel
##
## 92 samples
## 1035 predictors
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 92, 92, 92, 92, 92, 92, ...
## Resampling results:
##
## Accuracy Kappa
## 0.4761856 -0.02138603
##
## Tuning parameter 'cost' was held constant at a value of 1
## Tuning
## parameter 'Loss' was held constant at a value of 2
Now let’s apply this to the test data.
25.1 Testing the Model
Once again, we’ll use predict()
to apply the model.
<- predict(svm_model, newdata = tw_to_test) svm_predict
And next, we’ll use confusionMatrix()
.
<- caret::confusionMatrix(svm_predict, conservative_data$conservative[-trainIndex], mode = "prec_recall")
svm_confusion_matrix svm_confusion_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 9 7
## 1 8 14
##
## Accuracy : 0.6053
## 95% CI : (0.4339, 0.7596)
## No Information Rate : 0.5526
## P-Value [Acc > NIR] : 0.3142
##
## Kappa : 0.1972
##
## Mcnemar's Test P-Value : 1.0000
##
## Precision : 0.5625
## Recall : 0.5294
## F1 : 0.5455
## Prevalence : 0.4474
## Detection Rate : 0.2368
## Detection Prevalence : 0.4211
## Balanced Accuracy : 0.5980
##
## 'Positive' Class : 0
##
Although the accuracy of this algorithm is pretty similar to the kNN, the F-1 score is better (0.37 compared to 0.19), because the recall of this model is twice as good as the recall in the previous model (even though the precision is about the same).
Want to learn more about SVM? Checkout these guides here:
* Towards Data Science analysis of SVM and KNN (among others)
* MonkeyLearn Explanation of hyperplanes.
* StatQuest Video on SVM.
* SVM in e1071, another package with svm()