Chapter 10 KNN

K-nearest neighbors (KNN) is an approach that can be used for prediction. It can be used when Y is categorical or numeric, but the code is slightly different depending on the Y.

In the codes below, we perform KNN with Euclidean Distance, but other distance metrics can be used. Using Euclidean Distance, all features must be numeric.

10.1 KNN: Categorical Y

# The library we need
library(class)
# The function
results <- knn( train = , test = , cl = , k = )

Inputs

  • train =: This is the training data set. Make sure to only include the columns you want to use as features.
  • test =: This is the test data set. Make sure to only include the columns you want to use as features.
  • cl =: This is the response variable in the training data set. Make sure to only include the response variable column.
  • k =: This is the choice of K you specify.

10.2 KNN: Numeric Y

# The library we need 
library(caret)
# The code we need to run the algorithm
results <- knnreg( train , Y,  k = )
# The code we need to make predictions 
knnPred <- predict(results, newdata = )

Inputs

  • train: Replace this with your training data. Make sure to only include the rows you want to use as features.
  • Y: Replace with the the response variable from your training data.
  • k =: This is your choice of K
  • newdata: This is the data you want to make predictions on (usually the test data). Make sure this only contains the features you need!