Chapter 8 R Lab 7 - 26/05/2023

In this lecture we will learn how to implement a feed forward neural network. For running the following code you need to have Python installed on your computer (you can download the Anaconda Python distribution at https://www.anaconda.com/download). I’m sorry but I’m not able to show you here in the html file the keras output because this is giving me an error when I compile the html with RMarkdown (and I haven’t find a solution yet).

The following packages are required: tidyverse and keras (this has to be installed first).

library(tidyverse)

library(keras)

If you have problems/errors with keras it is very likely that you have to close RStudio and run the following code (this is needed only once):

install.packages(tensorflow)
library(tensorflow)
install_tensorflow()

Then try again to load the keras library:

library(keras)

8.1 Tensors

A tensor is the generalization of vectors and matrices to more than 2 dimensions. A vector in R is a 1D tensor and element selection is performed by using only one index:

set.seed(1)
x = rnorm(24) %>% round(1) #24 elements
x #vector

##  [1] -0.6  0.2 -0.8  1.6  0.3 -0.8  0.5  0.7  0.6 -0.3  1.5  0.4 -0.6 -2.2  1.1  0.0  0.0
## [18]  0.9  0.8  0.6  0.9  0.8  0.1 -2.0

x[1]

## [1] -0.6

A matrix is a 2D-tensor. It has two axes given by rows and columns:

xmat = matrix(x, nrow=6) #automatically we have 4 columns
xmat

##      [,1] [,2] [,3] [,4]
## [1,] -0.6  0.5 -0.6  0.8
## [2,]  0.2  0.7 -2.2  0.6
## [3,] -0.8  0.6  1.1  0.9
## [4,]  1.6 -0.3  0.0  0.8
## [5,]  0.3  1.5  0.0  0.1
## [6,] -0.8  0.4  0.9 -2.0

dim(xmat)

## [1] 6 4

xmat[1,1] #element in first row, first column

## [1] -0.6

A 3D tensor is like a cube of numbers. It is characterized by 3 dimensions (or axes) and is created using the array function:

xarray = array(x, dim=c(2,3,4))
dim(xarray)

## [1] 2 3 4

xarray[1,,]

##      [,1] [,2] [,3] [,4]
## [1,] -0.6  0.5 -0.6  0.8
## [2,] -0.8  0.6  1.1  0.9
## [3,]  0.3  1.5  0.0  0.1

Note the 2 commas in xarrary[1,,]: this is a selection in the first axis and will return you a 3 by 4 matrix. It’s important to note that generally the first axis refers to the observations you have in your data.

8.2 Load the mnist data

The MNIST dataset is a classic in the machine-learning community. We will deal with a multiclass classification problem. We want to classify grayscale images of handwritten digits (each image is composed by 28 by 28 pixels) into their 10 categories (from 0 to 9). We have 60000 training images and 10000 test images. Each contained values between 0 (black) and 255 (white). The data are provided with the keras library and can be loded as follows

mnist = dataset_mnist()

The object mnist is a list with 2 objects:

names(mnist$train)

We create now 4 different objects (regressors and response variable both for the training and the test data). We start with the training regressors, i.e. the 28 by 28 gray color values for each of the 60000 pictures contained in a 3D tensor:

train_images = mnist$train$x
dim(train_images)

We can also extract for example the second training picture and plot it:

train_images[2, , ]
plot(as.raster(train_images[2, , ], max=255))

We then extract the response variable (labels) for the training picture. This will be a vector containing the number contained in each picture:

train_labels = mnist$train$y
head(train_labels)

We do the same for the test pictures:

test_images = mnist$test$x
dim(test_images)
test_labels = mnist$test$y  
head(test_labels)

Before training the neural network, we reshape the features moving from a 3D tensor of dimension (60000, 28, 28) to a 2D tensor with dimension (60000, 28*28):

train_images = array_reshape(train_images,
                             dim = c(60000, 28*28))
test_images = array_reshape(test_images,
                             dim = c(10000, 28*28))

Moreover, we transform the values so that instead of being defined in \([0,255]\) they will be defined in the set \([0,1]\):

train_images = train_images / 255
test_images = test_images / 255

Finally we transform the two vectors (1D tensor) containing the response variable into a 2D tensor using the one-hot enconding approach:

head(train_labels) #1D
train_labels = to_categorical(train_labels)  
head(train_labels)  #2D now, number of columns given by the number of categories
test_labels = to_categorical(test_labels)

8.3 Set the network

We will implement a feedforward neural network with one hidden layers (with 512 units and the Rectified Linear Unit (ReLU) activation function). Given that it we have a multiclass problem the output layer will be composed by 10 units and we will use the softmax activation function:

network = keras_model_sequential() %>% 
  layer_dense(units = 512, 
              activation = "relu",
              input_shape = 28*28) %>% 
  layer_dense(units = 10,
              activation = "softmax")
network

Note that the layers are created using layer_dense and only for the first one with have to specify the dimensions of the input for each observation (in this case the 28 by 28 values).

Then we have to specify something more for the network: the loss function used by the gradient descent algorithm for updating the weights (cross entropy in this case) and the metrics that will be used for evaluating the final performance (accuracy in this case). Note that in the following code we don’t have to use network = network %>% ...; the object network will be automatically updated:

network %>% compile(
  loss = "categorical_crossentropy",
  metrics = "accuracy"
)

If you type network you will see how many parameters has to be estimated.

8.4 Fitting the neural network

It’s now time to train the neural network. We decide in particular to use 10 epochs and for each epoch to use batches of size \(2^7=128\). We function for training the network is named fit and requires the specification of the training data:

output = network %>% 
  fit(train_images,
      train_labels,
      epochs = 10, 
      batch_size = 2^7)

During the training you will see in a dynamic graph the evolution of the TRAINING loss and accuracy across the epochs.

8.5 Predictions

After the training has completed we can compute the predictions (i.e. the predicted value) for the observations in the test set. Let’s consider for example the first 3 test pictures. The following code is returning

network %>% predict(test_images[3,])

the predicted 10 probabilities for each of the 10 classes (the output will be a 3 by 10 matrix). We are interested in the most likely category that can automatically by obtained with

network %>% predict(test_images[3,]) %>% k_argmax()

The final vector of predictions, for all the test pictures, is given by

pred = network %>% 
  predict(test_images) %>% 
  k_argmax()
length(pred)
class(pred)

and the confusion matrix can by obtained as usually:

table(pred = as.numeric(pred), obs:mnist$test$y)

The function as.numeric is used to transform the keras object pred into a simple numerical vector. The sum of the values on the diagonal is giving us the total number of correctly predicted pictures using the neural network. This final evaluation on the TEST data can be computed also using the evaluate function that returns the test accuracy and loss values:

network %>% 
  evaluate(test_images, test_labels)

8.6 Use also a validation set

We know that we should tune the number of epochs to use using a validation set. For this aim we use the first 20000 training images as validation images. We thus prepare the validation data and the new training data with 40000 pictures:

valindex = 1:20000 #not a random sampling
val_images = train_images[valindex, ]
val_labels = train_labels[valindex, ]

newtrain_images = train_images[-valindex, ]
newtrain_labels = train_labels[-valindex, ]

We now have to reinitialize the network and redo the fitting specifying the validation data with validation_data. We use now a larger number of epochs (20) to evaluate the training and validation loss/accuracy:

network = keras_model_sequential() %>% 
  layer_dense(units = 512, 
              activation = "relu",
              input_shape = 28*28) %>% 
  layer_dense(units = 10,
              activation = "softmax")

network %>% compile(
  loss = "categorical_crossentropy",
  metrics = "accuracy"
)


output = network %>% 
  fit(newtrain_images,
      newtrain_labels,
      epochs = 20, 
      batch_size = 2^7,
      validation_data = list(val_images, val_labels)) #this is the new part

We expect to see the famous “U-shape” in the validation loss and we should choose - to avoid overfitting - the number of epochs corresponding to the lowest value of the training loss. We choose 5 epochs. So we go back to the original fitting using 5 epochs as tuned value.

network = keras_model_sequential() %>% 
  layer_dense(units = 512, 
              activation = "relu",
              input_shape = 28*28) %>% 
  layer_dense(units = 10,
              activation = "softmax")

network %>% compile(
  loss = "categorical_crossentropy",
  metrics = "accuracy"
)


output = network %>% 
  fit(train_images, #the full training data
      train_labels,
      epochs = 5,  # tuned valued of epochs
      batch_size = 2^7)

The final performance is given by

network %>% 
  evaluate(test_images, test_labels)

8.7 Exercises Lab 7

See the file AIMLFFRlab7_exercises.html available in Moodle (folder Exercises).