# 13 Machine Learning for Imagery Classification

## 13.1 Introduction

Using machine learning algorithms is one approach to imagery classification, whether that classification is object-based or pixel-based. We’re going to focus on a pixel-based approach to identify land cover from imagery or other continuous raster variables (such as elevation or elevation-derived rasters such as slope, curvature, and roughness), employing samples of those variables called training samples in a “supervised” imagery classification.

It’s useful to realize that the modeling methods we use for this type of imagery classification are really no different at the core from methods we’d use to work with continuous variables that might not even be spatial. For example, if we leave off the classification process for a moment, a machine learning algorithm might be used to predict a response result from a series of predictor variables, like predicting temperature from elevation and latitude, or acceleration might be predicted by some force acting on a body. So the first model might be used to predict the temperature of any location (within the boundary of the study area) given an elevation and latitude; or the second model might predict an acceleration given the magnitude of a force applied to the body (maybe of a given mass). A classification model varies on this by predicting a nominal variable like type of land cover; some other types of responses might be counts (using a Poisson model) or probabilities (using a logistic model).

The imagery classification approach adds to this model an input preparation process and an output prediction process:

• A training set of points and polygons are created that represent areas of known classification such as land cover like forest or wetland, used to identify values of the predictor variables (e.g. imagery bands)
• A predicted raster is created using the model applied to the original rasters.

This chapter delves into methods in the subdiscipline of remote sensing which students are encouraged to learn more about in courses such as GEOG 610 and 611 at San Francisco State University (). Some key things to learn more about are the nature of the electromagnetic spectrum and especially bands of that spectrum that are informative for land cover and especially vegetation detection, and the process of image classification.

### 13.1.1 Validation, creating a training set and the “overfit model” problem

An important part of the imagery classification process is validation, where we look at how well the model works. The way this is done is pretty easy to understand, and requires have testing data in addition to the training data mentioned above. Testing data can be created in a variety of ways – we’ll randomly pull out a percentage of our training data and not use it for training, only for testing. Then we’ll employ a confusion matrix and various measures of accuracy to see how well the model predicts the testing data.

#### 13.1.1.1 The “overfit model” problem

It’s important to realize that the accuracy we determine is only based on the training and testing data. The accuracy of the prediction of the classification elsewhere will likely be somewhat less than this, and if this is substantial our model is “overfit.” We don’t actually know how overfit a model truly is because that depends on how likely the conditions seen in our training and testing data also occur throughout the rest of the image; if those conditions are common, just not sampled, then the model might actually be pretty well fit.

In thinking about the concept of overfit models and selecting out training (and testing) sets, it’s useful to consider the purpose of our classification and how important it is that our predictions are absolutely reliable. In choosing training sets, accuracy is also important, so we will want to make sure that they are good representatives of the land cover type (assuming that’s our response variable). While some land covers are pretty clear (like streets or buildings), there’s a lot of fuzziness in the world: you might be trying to identify wetland conditions based on the type of vegetation growing, but in a meadow you can commonly find wetland species mixed in with more mesic species – to pick a reliable wetland sample we might want to only pick areas with only wetland species (and this can get tricky since there are many challenges of “obligate wetland” or “facultative wetland” species.) It might actually be more appropriate to use a probability model to do this, but we can also take a spatial approach to prediction and accept that some areas will tend more toward one vs. another.

## 13.2 Imagery Setup and Visualization Methods

Since machine learning algorithms are very black boxy, getting them working right mostly requires getting good data inputs and tweaking the variety of parameters that control how the model training proceeds. Fortunately there are a lot of good online resources such as:

• – the source of my original inspiration and basic methods used in this section. This paper also describes clipping the data and some ideas on web scraping Sentinel-2 imagery from Copernicus.
• various sources from Google searches such as

As with everything else in R, the specific coding methods used vary with the coder and what they’re used to using. If a method works, it’s certainly ok to keep using it, but where it can be clarified with clearer code such as from the tidyverse or sf, it may make it easier to understand and improve on.

This code accesses Sentinel-2 imagery downloaded from covering Red Clover Valley, where we are doing research on biogeochemical cycles and carbon sequestration from hydrologic restoration with The Sierra Fund (), and then cropped to that site in a previous script, and saved to folders named in the form cropYYYYMMDD, such as "crop20210708". The following code chunk specifies the date, with multiple cropped images stored in igisci.

imgDate <- "20210708"
# other dates stored:  "20200802"
library(rgdal)         # [readOGR]
library(gdalUtils)
#library(raster)  # [writeRaster OKterra, raster::brick, raster::beginCluster, unique OKterra, projecton(notused), rasterToPoints(notused)]
library(sf)
library(sp)            # [spTransform]
library(RStoolbox)     # Remote Sensing toolbox.  [normImage]
library(getSpatialData)# for reading Sentinel, Landsat, MODIS, SRTM [set_aoi, view_aoi]
# library(rasterVis)   # terra required, creates error in module
# "spat" -- function 'Rcpp_precious_remove' not provided by package 'Rcpp'
library(mapview)       # [viewRGB, mapview]
library(RColorBrewer)
library(plotly)        # [plot_ly]
library(grDevices)

library(caret)         # [createDataPartition, createFolds, trainControl, train, confusionMatrix]

library(data.table)    # [setDT, setnames]
library(dplyr)
library(stringr)
library(doParallel)    # [registerDoParallel]
library(snow)          # [makeCluster, stopCluster]
library(parallel)      # [makeCluster, stopCluster, detectCores]

# set the temporary folder for raster package operations
unlink("./cache", recursive = T) # Deletes crop folder if it already exists
dir.create(file.path(".", "cache")) # will create warning if it already exists
dir.create(file.path("./cache", "temp"))
raster::rasterOptions(tmpdir = "./cache/temp")

### 13.2.1 Read all 10m & 20m bands into a list of 10 bands

As documented at , Sentinel-2 imagery is collected at three resolutions, with the most bands at the coarsest (60m) resolution. The bands added at that coarsest resolution are not critical for our work as they relate to oceanographic and atmospheric research, and our focus will be on land cover and vegetation in a terrestrial area. So we’ll work with four bands at 10m and an additional six bands at 20m resolution:

10 m bands

• B02 - Blue 0.490 $$\mu$$m
• B03 - Green 0.560 $$\mu$$m
• B04 - Red 0.665 $$\mu$$m
• B08 - NIR 0.842 $$\mu$$m

20 m bands

• B05 - Red Edge 0.705 $$\mu$$m
• B06 - Red Edge 0.740 $$\mu$$m
• B07 - Red Edge 0.783 $$\mu$$m
• B11 - SWIR 1.610 $$\mu$$m
• B12 - SWIR 2.190 $$\mu$$m
• B8A - NIR 0.865 $$\mu$$m
dtaPath <- system.file("extdata", "RCVimagery", package="igisci")
imgList <- list.files(paste0(dtaPath,"/crop", imgDate), pattern = "*.tif", full.names = TRUE)
rst_lst <- lapply(imgList, FUN = raster::raster) # Apply the terra() function over the list of 10 bands
names(rst_lst) <- str_extract(sapply(rst_lst, names), "B.{2}")
# sapply here returns a vector using the names() function,
# then str_extract() just gets the part of file name with B followed by any 2 characters 

Note that the order will be automatically detected, and the 10m B08 will follow the 20m B07:

str_extract(sapply(rst_lst, names), "B.{2}") # shows the order of bands, with B08 (NIR) as the 7th
##  [1] "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B11" "B12" "B8A"

### 13.2.2 Visualize the original imagery

Displaying the imagery 3 bands at a time (displayed as RGB on our computer screen) is always a good place to start, and two especially useful band sets are RGB itself – so looking like a conventional color aerial photography – and “false color” that includes a band normally invisible to our eyes, such as near infrared that reflects chlorophyll in healthy plants.

A RasterBrick is commonly used to store multiband imagery, and we’ll create a brick from the imagery currently stored in a list.

raster::plotRGB(raster::brick(rst_lst[1:3]), r = 3, g = 2, b = 1)  # [mapview::viewRGB, raster::brick]

#### 13.2.2.1 Visualize the image in false color (IR-R-G as R-G-B).

In standard “false color” surface-reflected near infrared is displayed as red, reflected red is displayed as green and green is displaed as blue. Some advantages of this false color display include: - Blue is not included at all, which helps reduce haze - Water bodies absorb NIR, so appear very dark in the image - Chlorophyll strongly reflects NIR, so healthy vegetation is bright red

raster::plotRGB(raster::brick(rst_lst[c(2,3,7)]), r = 3, g = 2, b = 1)  # [mapview, raster]

### 13.2.3 Set up prediction raster brick

Needed to build a brick raster object to be used for prediction result $see raster::brick, etc.$

rst_for_prediction <- vector(mode = "list", length = length(rst_lst))
names(rst_for_prediction) <- names(rst_lst)

### 13.2.4 Resample 20 m bands to 10 m

In addition to the 4 bands at 10 m resolution (B02, B03, B04, B08), there are 5 bands at 20 m resolution (B05, B06, B07, B8A, B11, B12) which provide red edge and SWIR useful for vegetation and moisture indices beyond NDVI. To be useful in R as a composite (brick), we need to get these all to the same extent and cell size, so we’ll resample the 20 m bands to 10, and end up with a brick (composite) of all 10 rasters.

# method from Stefan (2019)
for (b in c("B05", "B06", "B07", "B8A", "B11", "B12")){
raster::beginCluster(n = round(3/4 * detectCores()))  # [raster]
try(
rst_for_prediction[[b]] <- raster::resample(x = rst_lst[[b]],
y = rst_lst$B02) ) # y is raster object with parameters that x should be resampled to raster::endCluster() } b_10m <- c("B02", "B03", "B04", "B08") rst_for_prediction[b_10m] <- rst_lst[b_10m] brick_for_prediction <- raster::brick(rst_for_prediction) # [raster] ### 13.2.5 Principle Component Analysis (PCA) & Visualization PCA creates new uncorrelated variables from correlated variables (bands in this case), with the purpose of identifying the key dimensions that provide the most information. rcv_PCA <- rasterPCA(brick_for_prediction, nSamples=NULL, nComp = raster::nlayers(brick_for_prediction), spca = FALSE) rcv_PCA_stack <- raster::stack(rcv_PCA$map)
raster::plotRGB(rcv_PCA_stack, r=1, g=2, b=3, stretch="lin")
#mapview(rcv_PCA_stack)
nir <- brick_for_prediction$B07 ndvi <- (nir - red)/(nir + red) ndviPos <- ndvi * (ndvi > 0) library(tmap) tm_shape(ndviPos) + tm_raster(palette="Greens") ### 13.2.7 Normalize (center & scale) raster images: Subtract the mean and divide by the standard deviation for each variable/feature/band. While the data don’t need normalization to convert to normal distribution, the transformation is needed for the ML algorithms, especially for the neural networks. brick_for_prediction_norm <- normImage(brick_for_prediction) # [RStoolbox] names(brick_for_prediction_norm) <- names(brick_for_prediction) ### 13.2.8 Create the same AOI that was used to crop the image, for display purposes aoi <- matrix(data = c(-120.470, 39.975, # top left -120.396, 39.975, # top right -120.396, 39.918, # bottom right -120.470, 39.918, # bottom left -120.470, 39.975), # close polygon ncol = 2, byrow = TRUE) set_aoi(aoi) # [getSpatialData] view_aoi() # [getSpatialData] ### 13.2.9 Training polygons We’ll derive points from these polygons (digitized in ArcGIS, but could be created with other GIS programs). Later in the code, we’ll write out the resulting predicted class counts percentages which we can compare with training point class percentages; if these differ significantly we might want to go in and edit the polygons, adding additional ones in classes under-represented by the training set. Read the training polygons in shapefiles with ‘class’ field digitized from multispectral drone imagery, and assign colors using hex codes. The purpose of the classification we’ll use is on vegetation that provides a signal for meadow restoration where the goal is to sequester carbon. Sedge (Carex) species such as C. nebrascensis and C. utriculatata as well as are some rush (Juncus and Eleocharis) species. For these herbaceous species where there’s quite a mix of plants in particular areas that could be mapped, we try to map to the California Native Plant Society () alliances of associated species. These alliances are often named for a dominant species like Carex nebrascensis and we’ve labeled these with 6-character names such as “CARNEB,” ranging from driest to wettest: • ARTTRI : Artemisia tridentata, Big sagebrush (typical xeric) • mesic : many possible alliances of grasses and forbs • ARTCAN : Artemisia cana, Silver sagebrush wet shrubland • CARNEB : Carex nebrascensis, Nebraska sedge meadows • CARUTR : Carex (utriculata, vesicaria), Beaked sedge and blister sedge meadows Other categories are two trees, willow (SALIX) and Jeffrey Pine (PINJEF), and various other land covers such as bare, pavement, water, and rocky. poly <- read_sf(file.path(dtaPath, "train_polys.shp")) %>% mutate(id = as.integer(factor(class)))# %>% # creates a numeric id useful for rasterization #filter(!st_is_empty(geometry)) # one polygon was empty, remove it #setDT(poly@data) # Prepare colors for each class. cls_dt <- poly %>% st_drop_geometry() %>% distinct(class, id) %>% # [raster] arrange(id) %>% # sorting required for the next step to work mutate(hex = c(ARTCAN = "#e9ffbe", ARTTRI = "#ff7f7f", bare = "#cdaa66", CARNEB = "#00ffc5", CARUTR = "#00a884", mesic = "#ffff66", pavement = "#b2b2b2", PINJEF = "#007700", rocky = "#222222", SALIX = "#66ff33", water = "#0033AA")) #view_aoi() + #mapview.Options(basemaps = c("OpenTopoMap", "Esri.WorldImagery")) mapview(poly, zcol = "class", col.regions = cls_dt$hex)
poly_utm <- st_transform(poly, crs = st_crs(rst_lst[[1]]))

### 13.2.10 Extract training values from polygons at 10 m resolution

1. Convert the vector polygons to raster using a template based on the image raster (10 m cells)
2. Convert the raster to points in a data.table format
3. Use these points to extract values from the Sentinel bands. (Method from Stefan 2019 ()
# convert the training polygons to points on the raster grid
rasterpolys <- raster::rasterize(poly_utm, rst_lst$B02, field = "id") points <- raster::rasterize(poly_utm, rst_lst$B02, field = "id") %>%
stars::st_as_stars() %>%
st_as_sf(as_points = TRUE) %>%
left_join(cls_dt, by = c("layer" = "id")) # reattach class names

# Extract band values to points
points_rastervals <- brick_for_prediction_norm %>%
raster::extract(y = points) %>%
as.data.frame()

# Attach band values to points AND drop spatial data
points_with_rastervals <- bind_cols(points, points_rastervals) %>%
st_drop_geometry()

### 13.2.11 Histograms of predictors

points_with_rastervals %>%
dplyr::select(starts_with("B")) %>%
tidyr::pivot_longer(everything(), names_to = "band", values_to = "value") %>%
ggplot() +
geom_histogram(aes(value)) +
geom_vline(xintercept = 0, color = "gray70") +
facet_wrap(facets = vars(band), ncol = 3)

### 13.2.12 Split into training and testing subsets

The training polygons needs to be split into two subsets, one for training, the other for validation (testing). The training subset will further be optimized using cross validation and grid search. Then the final optimized models are validated using the test subset to build confusion matrices and accuracy statistics.

See the package for documentation on the various steps and settings.

dt_all <- points_with_rastervals %>%
dplyr::select(class, starts_with("B")) %>%
mutate(class = factor(class))

set.seed(321)
# A stratified random split of the data
idx_train <- createDataPartition(dt_all$class, # [caret] p = 0.7, # percentage of data used for training list = FALSE) dt_train <- dt_all[idx_train,] dt_test <- dt_all[-idx_train,] table(dt_train$class)
##
##   ARTCAN   ARTTRI     bare   CARNEB   CARUTR    mesic pavement   PINJEF
##        5      158        9       28       24      149        5      578
##    rocky    SALIX    water
##       12       29       31
table(dt_test$class) ## ## ARTCAN ARTTRI bare CARNEB CARUTR mesic pavement PINJEF ## 2 67 3 12 10 63 1 247 ## rocky SALIX water ## 4 12 13 ## 13.3 Setting Up and Fitting Models The next step is setting up and fitting models – we’ll look at: - Support Vector Machine (SVM) - Random Forest - Neural Network Note that the first part of this is the same for the other models. The training dataset is used for to cross-validate and model tuning. Once the optimal/best parameters were found a final model is fit to the entire training dataset using those findings. Then we validate. Details are provided in the intro vignette of caret package https://cran.r-project.org/web/packages/caret/vignettes/caret.html Cross validation (CV) is used to compare models for optimization, using a number of folds which must be set for each model. Also see help(trainControl). # create cross-validation folds (splits the data into n random groups) n_folds <- 10 set.seed(321) folds <- createFolds(1:nrow(dt_train), k = n_folds) # [caret] # Set the seed at each resampling iteration. Useful when running CV in parallel. seeds <- vector(mode = "list", length = n_folds + 1) # +1 for the final model for(i in 1:n_folds) seeds[[i]] <- sample.int(1000, n_folds) seeds[n_folds + 1] <- sample.int(1000, 1) # seed for the final model ### 13.3.1 Set up model training controls (for all models) While the various model types have slightly different requirements, the following will work reasonably well for all: ctrl <- trainControl(summaryFunction = defaultSummary, # [caret] # was multiClassSummary -- causes rf and nn to fail method = "cv", number = n_folds, search = "grid", classProbs = TRUE, # not implemented for SVM; will just get a warning savePredictions = TRUE, index = folds, seeds = seeds) ### 13.3.2 Support Vector Machine (SVM) model The Support Vector Machine model is one of the simplest machine learning algorithms. “L2 Regularized Support Vector Machine (dual) with Linear Kernel.” To try other SVM options see SVM tags. The trainControl parameter setting importance = TRUE is not applicable for SVM. Same for class probabilities classProbs = TRUE defined in ctrl above. However, this doesn’t create a problem, so it was easier to just leave it the same, and it will be useful for random forests. #### 13.3.2.1 Train SVM model $After reinstalling caret, parallel tools makeCluster or detectCores create errors, but parallel processing isn't essential for this size dataset$ # Grid of tuning parameters svm_grid <- expand.grid(cost = c(0.2, 0.5, 1), Loss = c("L1", "L2")) model_svm <- caret::train(class ~ . , method = "svmLinear3", data = dt_train, allowParallel = FALSE, # since parallel now fails tuneGrid = svm_grid, trControl = ctrl) registerDoSEQ() # [foreach] saveRDS(model_svm, file = "./cache/model_svm.rds") # FIX LATER #### 13.3.2.2 SVM model summary & confusion matrix The confusion matrix provides the number of matches and “confusions” that exist between the test data and what the model predicts. The diagonal shows the number of matches and everywhere else is a mismatch. Displayed below the matrix are overall statistics such as accuracy and Kappa statistic. Then a table of statistics by class is displayed; sensitivity is the “true positive rate” where the model is correct in detecting the class, while specificity is the “true negative rate” where the model correctly rejects the class. model_svm$times$everything # total computation time ## user system elapsed ## 3.08 0.08 3.20 ## user system elapsed ## 1.44 0.31 16.55 plot(model_svm) # tuning results # The confusion matrix using the test dataset cm_svm <- confusionMatrix(data = predict(model_svm, newdata = dt_test), # [caret] dt_test$class)
cm_svm
## Confusion Matrix and Statistics
##
##           Reference
## Prediction ARTCAN ARTTRI bare CARNEB CARUTR mesic pavement PINJEF rocky SALIX
##   ARTCAN        2      0    0      0      0     0        0      0     0     0
##   ARTTRI        0     67    0      0      0     0        0      1     0     0
##   bare          0      0    3      0      0     0        0      0     0     0
##   CARNEB        0      0    0     11      2     0        0      0     0     0
##   CARUTR        0      0    0      0      6     0        0      0     0     0
##   mesic         0      0    0      0      0    63        0      0     0     0
##   pavement      0      0    0      0      0     0        1      0     0     0
##   PINJEF        0      0    0      1      2     0        0    246     0     0
##   rocky         0      0    0      0      0     0        0      0     4     0
##   SALIX         0      0    0      0      0     0        0      0     0    12
##   water         0      0    0      0      0     0        0      0     0     0
##           Reference
## Prediction water
##   ARTCAN       0
##   ARTTRI       0
##   bare         0
##   CARNEB       0
##   CARUTR       0
##   mesic        0
##   pavement     0
##   PINJEF       1
##   rocky        0
##   SALIX        0
##   water       12
##
## Overall Statistics
##
##                Accuracy : 0.9839
##                  95% CI : (0.9671, 0.9935)
##     No Information Rate : 0.5691
##     P-Value [Acc > NIR] : < 2.2e-16
##
##                   Kappa : 0.9742
##
##  Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
##                      Class: ARTCAN Class: ARTTRI Class: bare Class: CARNEB
## Sensitivity               1.000000        1.0000    1.000000       0.91667
## Specificity               1.000000        0.9973    1.000000       0.99526
## Pos Pred Value            1.000000        0.9853    1.000000       0.84615
## Neg Pred Value            1.000000        1.0000    1.000000       0.99762
## Prevalence                0.004608        0.1544    0.006912       0.02765
## Detection Rate            0.004608        0.1544    0.006912       0.02535
## Detection Prevalence      0.004608        0.1567    0.006912       0.02995
## Balanced Accuracy         1.000000        0.9986    1.000000       0.95596
##                      Class: CARUTR Class: mesic Class: pavement Class: PINJEF
## Sensitivity                0.60000       1.0000        1.000000        0.9960
## Specificity                1.00000       1.0000        1.000000        0.9786
## Pos Pred Value             1.00000       1.0000        1.000000        0.9840
## Neg Pred Value             0.99065       1.0000        1.000000        0.9946
## Prevalence                 0.02304       0.1452        0.002304        0.5691
## Detection Rate             0.01382       0.1452        0.002304        0.5668
## Detection Prevalence       0.01382       0.1452        0.002304        0.5760
## Balanced Accuracy          0.80000       1.0000        1.000000        0.9873
##                      Class: rocky Class: SALIX Class: water
## Sensitivity              1.000000      1.00000      0.92308
## Specificity              1.000000      1.00000      1.00000
## Pos Pred Value           1.000000      1.00000      1.00000
## Neg Pred Value           1.000000      1.00000      0.99763
## Prevalence               0.009217      0.02765      0.02995
## Detection Rate           0.009217      0.02765      0.02765
## Detection Prevalence     0.009217      0.02765      0.02765
## Balanced Accuracy        1.000000      1.00000      0.96154

#### 13.3.2.3 SVM variable importance

Importance values indicate how much drop in performance occurs when the predictor is not used.

caret::varImp(model_svm)$importance %>% as.matrix %>% plot_ly(x = colnames(.), y = rownames(.), z = ., type = "heatmap", width = 350, height = 300) Figure 13.7: SVM importance #### 13.3.2.4 SVM prediction map predict_svm <- raster::predict(object = brick_for_prediction_norm, model = model_svm, type = 'raw') mapView(predict_svm, col.regions = cls_dt$hex)
raster::writeRaster(predict_svm, paste0(dtaPath, "/predict_svm", imgDate,".tif"), datatype = "INT2S", TFW = YES, overwrite=TRUE)

#### 13.3.2.5 Compare training vs SVM predicted % cover

… to see if training allocation needs adjusting to be more comparable.

classFactor <- factor(points_with_rastervals$class) training <- bind_cols(as.data.frame(summary(classFactor),make.names=T),as.data.frame(raster::freq(predict_svm))) %>% mutate(pct_train = summary(classFactor)/sum(summary(classFactor)), pct_predict = count/sum(count)) training ## summary(classFactor) value count pct_train pct_predict ## ARTCAN 7 1 8588 0.004787962 0.01992612 ## ARTTRI 225 2 77347 0.153898769 0.17946273 ## bare 12 3 47349 0.008207934 0.10986051 ## CARNEB 40 4 18709 0.027359781 0.04340916 ## CARUTR 34 5 5536 0.023255814 0.01284479 ## mesic 212 6 71217 0.145006840 0.16523973 ## pavement 6 7 5722 0.004103967 0.01327635 ## PINJEF 825 8 165410 0.564295486 0.38378903 ## rocky 16 9 10390 0.010943912 0.02410718 ## SALIX 41 10 10477 0.028043776 0.02430904 ## water 44 11 10247 0.030095759 0.02377538 ### 13.3.3 Random Forest model_rf <- caret::train(class ~ . , method = "rf", data = dt_train, importance = TRUE, # passed to randomForest() allowParallel = FALSE, tuneGrid = data.frame(mtry = c(2, 3, 4, 5, 8)), trControl = ctrl) registerDoSEQ() saveRDS(model_rf, file = "./cache/model_rf.rds") #### 13.3.3.1 RF confusion matrix model_rf$times$everything # total computation time ## user system elapsed ## 1.56 0.03 1.64 plot(model_rf) # tuning results cm_rf <- confusionMatrix(data = predict(model_rf, newdata = dt_test), dt_test$class)
cm_rf
## Confusion Matrix and Statistics
##
##           Reference
## Prediction ARTCAN ARTTRI bare CARNEB CARUTR mesic pavement PINJEF rocky SALIX
##   ARTCAN        2      0    0      0      0     0        0      1     0     0
##   ARTTRI        0     67    0      0      0     0        0      0     0     0
##   bare          0      0    3      0      0     0        0      0     0     0
##   CARNEB        0      0    0     12      1     0        0      0     0     0
##   CARUTR        0      0    0      0      8     0        0      0     0     0
##   mesic         0      0    0      0      0    63        0      0     0     0
##   pavement      0      0    0      0      0     0        1      0     0     0
##   PINJEF        0      0    0      0      0     0        0    246     0     0
##   rocky         0      0    0      0      0     0        0      0     4     0
##   SALIX         0      0    0      0      1     0        0      0     0    12
##   water         0      0    0      0      0     0        0      0     0     0
##           Reference
## Prediction water
##   ARTCAN       0
##   ARTTRI       0
##   bare         0
##   CARNEB       0
##   CARUTR       0
##   mesic        0
##   pavement     0
##   PINJEF       0
##   rocky        0
##   SALIX        0
##   water       13
##
## Overall Statistics
##
##                Accuracy : 0.9931
##                  95% CI : (0.9799, 0.9986)
##     No Information Rate : 0.5691
##     P-Value [Acc > NIR] : < 2.2e-16
##
##                   Kappa : 0.989
##
##  Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
##                      Class: ARTCAN Class: ARTTRI Class: bare Class: CARNEB
## Sensitivity               1.000000        1.0000    1.000000       1.00000
## Specificity               0.997685        1.0000    1.000000       0.99763
## Pos Pred Value            0.666667        1.0000    1.000000       0.92308
## Neg Pred Value            1.000000        1.0000    1.000000       1.00000
## Prevalence                0.004608        0.1544    0.006912       0.02765
## Detection Rate            0.004608        0.1544    0.006912       0.02765
## Detection Prevalence      0.006912        0.1544    0.006912       0.02995
## Balanced Accuracy         0.998843        1.0000    1.000000       0.99882
##                      Class: CARUTR Class: mesic Class: pavement Class: PINJEF
## Sensitivity                0.80000       1.0000        1.000000        0.9960
## Specificity                1.00000       1.0000        1.000000        1.0000
## Pos Pred Value             1.00000       1.0000        1.000000        1.0000
## Neg Pred Value             0.99531       1.0000        1.000000        0.9947
## Prevalence                 0.02304       0.1452        0.002304        0.5691
## Detection Rate             0.01843       0.1452        0.002304        0.5668
## Detection Prevalence       0.01843       0.1452        0.002304        0.5668
## Balanced Accuracy          0.90000       1.0000        1.000000        0.9980
##                      Class: rocky Class: SALIX Class: water
## Sensitivity              1.000000      1.00000      1.00000
## Specificity              1.000000      0.99763      1.00000
## Pos Pred Value           1.000000      0.92308      1.00000
## Neg Pred Value           1.000000      1.00000      1.00000
## Prevalence               0.009217      0.02765      0.02995
## Detection Rate           0.009217      0.02765      0.02995
## Detection Prevalence     0.009217      0.02995      0.02995
## Balanced Accuracy        1.000000      0.99882      1.00000
model_rf$finalModel ## ## Call: ## randomForest(x = x, y = y, mtry = min(param$mtry, ncol(x)), importance = TRUE,      allowParallel = FALSE)
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 5
##
##         OOB estimate of  error rate: 1.95%
## Confusion matrix:
##          ARTCAN ARTTRI bare CARNEB CARUTR mesic pavement PINJEF rocky SALIX
## ARTCAN        5      0    0      0      0     0        0      0     0     0
## ARTTRI        0    157    0      0      0     0        1      0     0     0
## bare          0      0    7      0      0     2        0      0     0     0
## CARNEB        0      0    0     27      1     0        0      0     0     0
## CARUTR        0      0    0      4     18     0        0      0     0     2
## mesic         0      1    0      0      0   148        0      0     0     0
## pavement      0      1    0      0      0     0        4      0     0     0
## PINJEF        1      0    0      0      0     0        0    577     0     0
## rocky         0      0    0      0      0     0        0      0    12     0
## SALIX         0      0    0      0      4     0        0      1     0    24
## water         0      0    0      0      0     0        0      1     1     0
##          water class.error
## ARTCAN       0 0.000000000
## ARTTRI       0 0.006329114
## bare         0 0.222222222
## CARNEB       0 0.035714286
## CARUTR       0 0.250000000
## mesic        0 0.006711409
## pavement     0 0.200000000
## PINJEF       0 0.001730104
## rocky        0 0.000000000
## SALIX        0 0.172413793
## water       29 0.064516129

#### 13.3.3.2 RF predictor importance

caret::varImp(model_rf)$importance %>% as.matrix %>% plot_ly(x = colnames(.), y = rownames(.), z = ., type = "heatmap", width = 350, height = 300) Figure 13.9: rf variable importance This method does the same thing, specific to the random forest model: randomForest::importance(model_rf$finalModel) %>%
.[, - which(colnames(.) %in% c("MeanDecreaseAccuracy", "MeanDecreaseGini"))] %>%
plot_ly(x = colnames(.), y = rownames(.), z = ., type = "heatmap",
width = 350, height = 300)
randomForest::varImpPlot(model_rf$finalModel) predict_rf <- raster::predict(object = brick_for_prediction_norm, model = model_rf, type = 'raw') mapView(predict_rf, col.regions = cls_dt$hex)

### 13.3.4 Neural Network

# Grid of tuning parameters
nnet_grid <- expand.grid(size = c(5, 10, 15),
decay = c(0.001, 0.01, 0.1))

#cl <- makeCluster(3/4 * detectCores())
#registerDoParallel(cl)
model_nnet <- train(class ~ ., method = 'nnet', data = dt_train,
importance = TRUE,
maxit = 1000, # set high enough so to be sure that it converges
allowParallel = TRUE,
tuneGrid = nnet_grid,
entropy = TRUE,
trControl = ctrl)
#stopCluster(cl); remove(cl)
registerDoSEQ()
#saveRDS(model_nnet, file = "./cache/model_nnet.rds")
model_nnet$times$everything # total computation time
saveRDS(model_nnet, file = "./cache/model_nn.rds")
plot(model_nnet) # tuning results
# The confusion matrix using the test dataset
cm_nnet <- confusionMatrix(data = predict(model_nnet, newdata = dt_test),
dt_test$class) cm_nnet ## Confusion Matrix and Statistics ## ## Reference ## Prediction ARTCAN ARTTRI bare CARNEB CARUTR mesic pavement PINJEF rocky SALIX ## ARTCAN 2 0 0 0 0 0 0 0 0 0 ## ARTTRI 0 67 0 0 0 0 0 0 0 0 ## bare 0 0 3 0 0 0 0 0 0 0 ## CARNEB 0 0 0 12 0 0 0 0 0 0 ## CARUTR 0 0 0 0 10 0 0 0 0 0 ## mesic 0 0 0 0 0 63 0 0 0 0 ## pavement 0 0 0 0 0 0 1 0 0 0 ## PINJEF 0 0 0 0 0 0 0 246 0 0 ## rocky 0 0 0 0 0 0 0 0 4 0 ## SALIX 0 0 0 0 0 0 0 1 0 12 ## water 0 0 0 0 0 0 0 0 0 0 ## Reference ## Prediction water ## ARTCAN 0 ## ARTTRI 0 ## bare 0 ## CARNEB 0 ## CARUTR 0 ## mesic 0 ## pavement 0 ## PINJEF 0 ## rocky 0 ## SALIX 0 ## water 13 ## ## Overall Statistics ## ## Accuracy : 0.9977 ## 95% CI : (0.9872, 0.9999) ## No Information Rate : 0.5691 ## P-Value [Acc > NIR] : < 2.2e-16 ## ## Kappa : 0.9963 ## ## Mcnemar's Test P-Value : NA ## ## Statistics by Class: ## ## Class: ARTCAN Class: ARTTRI Class: bare Class: CARNEB ## Sensitivity 1.000000 1.0000 1.000000 1.00000 ## Specificity 1.000000 1.0000 1.000000 1.00000 ## Pos Pred Value 1.000000 1.0000 1.000000 1.00000 ## Neg Pred Value 1.000000 1.0000 1.000000 1.00000 ## Prevalence 0.004608 0.1544 0.006912 0.02765 ## Detection Rate 0.004608 0.1544 0.006912 0.02765 ## Detection Prevalence 0.004608 0.1544 0.006912 0.02765 ## Balanced Accuracy 1.000000 1.0000 1.000000 1.00000 ## Class: CARUTR Class: mesic Class: pavement Class: PINJEF ## Sensitivity 1.00000 1.0000 1.000000 0.9960 ## Specificity 1.00000 1.0000 1.000000 1.0000 ## Pos Pred Value 1.00000 1.0000 1.000000 1.0000 ## Neg Pred Value 1.00000 1.0000 1.000000 0.9947 ## Prevalence 0.02304 0.1452 0.002304 0.5691 ## Detection Rate 0.02304 0.1452 0.002304 0.5668 ## Detection Prevalence 0.02304 0.1452 0.002304 0.5668 ## Balanced Accuracy 1.00000 1.0000 1.000000 0.9980 ## Class: rocky Class: SALIX Class: water ## Sensitivity 1.000000 1.00000 1.00000 ## Specificity 1.000000 0.99763 1.00000 ## Pos Pred Value 1.000000 0.92308 1.00000 ## Neg Pred Value 1.000000 1.00000 1.00000 ## Prevalence 0.009217 0.02765 0.02995 ## Detection Rate 0.009217 0.02765 0.02995 ## Detection Prevalence 0.009217 0.02995 0.02995 ## Balanced Accuracy 1.000000 0.99882 1.00000 predict_nnet <- raster::predict(object = brick_for_prediction_norm, model = model_nnet, type = 'raw') mapView(predict_nnet, col.regions = cls_dt$hex)

### References

A Manual of California Vegetation Online, California Native Plant Society. n.d. https://vegetation.cnps.org/.
Clover Valley Ranch Restoration, the Sierra Fund. n.d. https://sierrafund.org/clover-valley-ranch/.
Copernicus Open Access Hub. n.d. https://scihub.copernicus.eu/.
Department of Geography & Environment, San Francisco State University. n.d. https://geog.sfsu.edu/.
Machine Learning with Caret. n.d. http://www.wvview.org/spatial_analytics/caret/_site.
R - Using Random Forests, Support Vector Machines and Neural Networks for a Pixel Based Supervised Classification of Sentinel-2 Multispectral Images. n.d. https://valentinitnelav.github.io/satellite-image-classification-r/.
The Caret Package. n.d. https://topepo.github.io/caret.
Tuning Machine Learning Algorithms in r. n.d. https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/.