## 11.3 Tuning Model’s Parameters

Tuning model parameters is a parameter optimization problem (Dalpiaz 2021). Depending on the models, the adjustable parameters can be different completely. For example, the decision tree has two adjustable parameters: `complexity parameter (CP)`

and `tune length (TL)`

. `CP`

tells the algorithm to stop when the measure (generally is accuracy) does not improve by this factor. `TL`

tells how many instances to use for training. SVM models, as another example, also have two adjustable parameters `cost`

and `gamma`

. The `cost`

, is a parameter that controls the trade-off between the classification of training points and a smooth decision boundary. It suggests the model chooses data points as a support vector. If the value of `cost`

is large, then the model choose more data points as a support vector and we get a higher variance and lower bias, which may lead to the problem of overfitting; If the value of `cost`

is small, then the model will choose fewer data points as a support vector and get a lower variance and high bias. `Gamma`

defines how far the influence of a single training example reaches. If the value of Gamma is high, then the decision boundary will depend on the points close to the decision boundary and the nearer points carry more weights than far away points due to which the decision boundary becomes more wiggly. If the value of `Gamma`

is low, then the far-away points carry more weights than the nearer points and thus the decision boundary becomes more like a straight line.

We will continue use *RF model* as an example to demonstrate the parameter tuning process. RF has many parameters that can be adjusted but the two main tuning parameters are ** mtry** and

**.**

`ntree`

`mtry`

: Number of variables randomly selected as testing conditions at each split of decision trees. default value is`sqr(col)`

. Increasing`mtry`

generally improves the performance of the model as each node has a higher number of options to be considered. However, this is not necessarily true as this decreases the diversity of individual trees. At the same time, it will decrease the speed. Hence, it needs to strike the right balance.`ntree`

: Number of trees to grow. the default value is 500. A higher number of trees give you better performance but makes your code slower. You should choose as high a value as your processor can handle because this makes your predictions stronger and more stable.

In the rest of the section, we demonstrate the process of using CV to fine-tune RF model’s parameters `mtry`

and `ntree`

. In general, different optimization strategies can be used to find a model’s optimal parameters. The two most commonly used methods for RF are **Random search** and **Grid search**.

*Random Search*. Define a search space as a bounded domain of parameter values and randomly sample points in that domain.

*Grid Search*. Define a search space as a grid of parameter values and evaluate every position in the grid.

Let us try them one at a time.

### Random Search

Random search provided by the package `caret`

with the method “`rf`

” (*Random forest*) in function `train`

can only tune parameter `mtry`

^{2}.

Let us continue using what we have found from the previous sections, that are：

- model
`rf.8`

with 9 predictors. *CV*with`3-folds`

and`repeat 10 times`

.

Let us also fix “`ntree = 500`

” and “`tuneLength = 15`

”, and use `random`

search to find `mtry`

.

```
#library(caret)
#library(doSNOW)
# Random Search
set.seed(2222)
# #use teh best sampling results that is K=3 ant T=10
# cv.3.folds <- createMultiFolds(rf.label, k = 3, times = 10)
#
# # Set up caret's trainControl object.
# ctrl.1 <- trainControl(method = "repeatedcv", number = 3, repeats = 10, index = cv.3.folds, search="random")
#
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
#
# # Set seed for reproducibility and train
# set.seed(34324)
#
# #use rf.train.8 with 9 predictors
#
# #RF_Random <- train(x = rf.train.8, y = rf.label, method = "rf", tuneLength = 15, ntree = 500, trControl = ctrl.1)
# #save(RF_Random, file = "./data/RF_Random_search.rda")
#
# #Shutdown cluster
# stopCluster(cl)
# Check out results
load("./data/RF_Random_search.rda")
print(RF_Random)
```

```
## Random Forest
##
## 891 samples
## 9 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 10 times)
## Summary of sample sizes: 594, 594, 594, 594, 594, 594, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.8435466 0.6643066
## 3 0.8453423 0.6690529
## 4 0.8437710 0.6665398
## 5 0.8419753 0.6630091
## 6 0.8397306 0.6586318
## 7 0.8383838 0.6556425
## 8 0.8379349 0.6544327
## 9 0.8353535 0.6495571
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.
```

We can see that the random search for `mtry`

has found the best value is 3. When the model uses the parameter `mtry = 3`

it can have an accuracy of 84.53%.

### Grid Search

Grid search is generally searching for more than one parameter. Each axis of the grid is a parameter, and points in the grid are specific combinations of parameters. Because caret train can only tune one parameter, the grid search is now a linear search through a vector of candidate values.

```
# ctrl.2 <- trainControl(method="repeatedcv", number=3, repeats=10, index = cv.3.folds, search="grid")
#
# set.seed(3333)
# # set Grid search with a vector from 1 to 15.
#
# tunegrid <- expand.grid(.mtry=c(1:15))
#
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
#
#
# #RF_grid_search <- train(y = rf.label, x = rf.train.8, method="rf", metric="Accuracy", trControl = ctrl.2, tuneGrid = tunegrid, tuneLength = 15, ntree = 500)
#
#
# #Shutdown cluster
# stopCluster(cl)
# #save(RF_grid_search, file = "./data/RF_grid_search.rda")
load("./data/RF_grid_search.rda")
print(RF_grid_search)
```

```
## Random Forest
##
## 891 samples
## 9 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 10 times)
## Summary of sample sizes: 594, 594, 594, 594, 594, 594, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 1 0.8232323 0.6140400
## 2 0.8439955 0.6652153
## 3 0.8452301 0.6691079
## 4 0.8443322 0.6675864
## 5 0.8428732 0.6645467
## 6 0.8398429 0.6584647
## 7 0.8379349 0.6548634
## 8 0.8390572 0.6571467
## 9 0.8370370 0.6529631
## 10 0.8365881 0.6519263
## 11 0.8359147 0.6504591
## 12 0.8370370 0.6525838
## 13 0.8365881 0.6520535
## 14 0.8356902 0.6502470
## 15 0.8354658 0.6494413
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.
```

The Grid search method identified the best parameter for `mtry`

is also 3. When `mtry = 3`

, the model’s estimated accuracy reaches 84.52%.

We can see that both search methods have the same `mtry`

suggestions.

### Manual Search

Let us consider another parameter ** ntree** in the

`RF model`

. Since our `train`

method from `caret`

cannot tune `ntree`

, we have to write our own function to search the best value of parameter `ntree`

. This method is also called **Manual Search**. The idea is to write a loop repeating the same model’s fitting process a certain number of times. Each time Within a loop, a different value of the parameter to be tuned is used, and the model’s results are accumulated, Finally, a manual comparison is made to figure out what is the best value of the tuned parameter.

To tune the RF model’s parameter `ntree`

, we set `mtry=3`

from the above section and use a list of 4 values (100, 500, 1000, 1500)^{3} and find which one produces the best result.

```
# Manual Search we use control 1 random search
model_list <- list()
tunegrid <- expand.grid(.mtry = 3)
control <- trainControl(method="repeatedcv", number=3, repeats=10, search="grid")
# # the following code have been commented out just for produce the markdown file. so it will not wait for ran a long time
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
#
#
# #loop through different settings
#
# for (n_tree in c(100, 500, 1000, 1500)) {
#
# set.seed(3333)
# fit <- train(y = rf.label, x = rf.train.8, method="rf", metric="Accuracy", tuneGrid=tunegrid, trControl= control, ntree=n_tree)
#
# key <- toString(n_tree)
# model_list[[key]] <- fit
# }
#
# #Shutdown cluster
# stopCluster(cl)
# save(model_list, file = "./data/RF_manual_search.rda")
# # the above code comneted out for output book file
load("./data/RF_manual_search.rda")
# compare results
results <- resamples(model_list)
summary(results)
```

```
##
## Call:
## summary.resamples(object = results)
##
## Models: 100, 500, 1000, 1500
## Number of resamples: 30
##
## Accuracy
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 100 0.7979798 0.8249158 0.8367003 0.8383838 0.8535354 0.8855219 0
## 500 0.8013468 0.8324916 0.8451178 0.8418631 0.8518519 0.8821549 0
## 1000 0.8013468 0.8282828 0.8434343 0.8415264 0.8518519 0.8787879 0
## 1500 0.8013468 0.8324916 0.8451178 0.8430976 0.8518519 0.8855219 0
##
## Kappa
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 100 0.5686275 0.6277878 0.6498418 0.6540654 0.6778695 0.7539114 0
## 500 0.5751073 0.6439474 0.6681725 0.6614719 0.6854823 0.7462468 0
## 1000 0.5751073 0.6327199 0.6676526 0.6608493 0.6823330 0.7394356 0
## 1500 0.5751073 0.6409731 0.6714760 0.6640467 0.6857492 0.7539114 0
```

We can see with the default *mtry =3* setting, the best *ntree* value is 1500. The model can reach 84.31% accuracy.

### References

Dalpiaz, David. 2021. *Tune Machine Learning Algorithms in R*. otexts. https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/.

Not all machine learning algorithms are available in caret for tuning. The choice of parameters was decided by the developers of the package. Only those parameters that have a large effect are available for tuning in caret. For the

`RF`

method, only`mtry`

parameter is available in caret for tuning. The reason is its effect on the final accuracy and that it must be found empirically for a dataset↩︎These are generally used

`ntree`

values. For demonstration purposes we only choose these values, you can try more different values.↩︎