Chapter 8 R Lab 7 - 26/05/2023
In this lecture we will learn how to implement a feed forward neural network. For running the following code you need to have Python installed on your computer (you can download the Anaconda Python distribution at https://www.anaconda.com/download). I’m sorry but I’m not able to show you here in the html file the keras
output because this is giving me an error when I compile the html with RMarkdown (and I haven’t find a solution yet).
The following packages are required: tidyverse
and keras
(this has to be installed first).
library(tidyverse)
library(keras)
If you have problems/errors with keras
it is very likely that you have to close RStudio and run the following code (this is needed only once):
install.packages(tensorflow)
library(tensorflow)
install_tensorflow()
Then try again to load the keras
library:
library(keras)
8.1 Tensors
A tensor is the generalization of vectors and matrices to more than 2 dimensions. A vector in R
is a 1D tensor and element selection is performed by using only one index:
set.seed(1)
= rnorm(24) %>% round(1) #24 elements
x #vector x
## [1] -0.6 0.2 -0.8 1.6 0.3 -0.8 0.5 0.7 0.6 -0.3 1.5 0.4 -0.6 -2.2 1.1 0.0 0.0
## [18] 0.9 0.8 0.6 0.9 0.8 0.1 -2.0
1] x[
## [1] -0.6
A matrix is a 2D-tensor. It has two axes given by rows and columns:
= matrix(x, nrow=6) #automatically we have 4 columns
xmat xmat
## [,1] [,2] [,3] [,4]
## [1,] -0.6 0.5 -0.6 0.8
## [2,] 0.2 0.7 -2.2 0.6
## [3,] -0.8 0.6 1.1 0.9
## [4,] 1.6 -0.3 0.0 0.8
## [5,] 0.3 1.5 0.0 0.1
## [6,] -0.8 0.4 0.9 -2.0
dim(xmat)
## [1] 6 4
1,1] #element in first row, first column xmat[
## [1] -0.6
A 3D tensor is like a cube of numbers. It is characterized by 3 dimensions (or axes) and is created using the array
function:
= array(x, dim=c(2,3,4))
xarray dim(xarray)
## [1] 2 3 4
1,,] xarray[
## [,1] [,2] [,3] [,4]
## [1,] -0.6 0.5 -0.6 0.8
## [2,] -0.8 0.6 1.1 0.9
## [3,] 0.3 1.5 0.0 0.1
Note the 2 commas in xarrary[1,,]
: this is a selection in the first axis and will return you a 3 by 4 matrix. It’s important to note that generally the first axis refers to the observations you have in your data.
8.2 Load the mnist data
The MNIST dataset is a classic in the machine-learning community. We will deal with a multiclass classification problem. We want to classify grayscale images of handwritten digits (each image is composed by 28 by 28 pixels) into their 10 categories (from 0 to 9). We have 60000 training images and 10000 test images. Each contained values between 0 (black) and 255 (white). The data are provided with the keras
library and can be loded as follows
= dataset_mnist() mnist
The object mnist
is a list with 2 objects:
names(mnist$train)
We create now 4 different objects (regressors and response variable both for the training and the test data). We start with the training regressors, i.e. the 28 by 28 gray color values for each of the 60000 pictures contained in a 3D tensor:
= mnist$train$x
train_images dim(train_images)
We can also extract for example the second training picture and plot it:
2, , ]
train_images[plot(as.raster(train_images[2, , ], max=255))
We then extract the response variable (labels) for the training picture. This will be a vector containing the number contained in each picture:
= mnist$train$y
train_labels head(train_labels)
We do the same for the test pictures:
= mnist$test$x
test_images dim(test_images)
= mnist$test$y
test_labels head(test_labels)
Before training the neural network, we reshape the features moving from a 3D tensor of dimension (60000, 28, 28) to a 2D tensor with dimension (60000, 28*28):
= array_reshape(train_images,
train_images dim = c(60000, 28*28))
= array_reshape(test_images,
test_images dim = c(10000, 28*28))
Moreover, we transform the values so that instead of being defined in \([0,255]\) they will be defined in the set \([0,1]\):
= train_images / 255
train_images = test_images / 255 test_images
Finally we transform the two vectors (1D tensor) containing the response variable into a 2D tensor using the one-hot enconding approach:
head(train_labels) #1D
= to_categorical(train_labels)
train_labels head(train_labels) #2D now, number of columns given by the number of categories
= to_categorical(test_labels) test_labels
8.3 Set the network
We will implement a feedforward neural network with one hidden layers (with 512 units and the Rectified Linear Unit (ReLU) activation function). Given that it we have a multiclass problem the output layer will be composed by 10 units and we will use the softmax activation function:
= keras_model_sequential() %>%
network layer_dense(units = 512,
activation = "relu",
input_shape = 28*28) %>%
layer_dense(units = 10,
activation = "softmax")
network
Note that the layers are created using layer_dense
and only for the first one with have to specify the dimensions of the input for each observation (in this case the 28 by 28 values).
Then we have to specify something more for the network: the loss function used by the gradient descent algorithm for updating the weights (cross entropy in this case) and the metrics that will be used for evaluating the final performance (accuracy in this case). Note that in the following code we don’t have to use network = network %>% ...
; the object network
will be automatically updated:
%>% compile(
network loss = "categorical_crossentropy",
metrics = "accuracy"
)
If you type network
you will see how many parameters has to be estimated.
8.4 Fitting the neural network
It’s now time to train the neural network. We decide in particular to use 10 epochs and for each epoch to use batches of size \(2^7=128\). We function for training the network is named fit
and requires the specification of the training data:
= network %>%
output fit(train_images,
train_labels,epochs = 10,
batch_size = 2^7)
During the training you will see in a dynamic graph the evolution of the TRAINING loss and accuracy across the epochs.
8.5 Predictions
After the training has completed we can compute the predictions (i.e. the predicted value) for the observations in the test set. Let’s consider for example the first 3 test pictures. The following code is returning
%>% predict(test_images[3,]) network
the predicted 10 probabilities for each of the 10 classes (the output will be a 3 by 10 matrix). We are interested in the most likely category that can automatically by obtained with
%>% predict(test_images[3,]) %>% k_argmax() network
The final vector of predictions, for all the test pictures, is given by
= network %>%
pred predict(test_images) %>%
k_argmax()
length(pred)
class(pred)
and the confusion matrix can by obtained as usually:
table(pred = as.numeric(pred), obs:mnist$test$y)
The function as.numeric
is used to transform the keras object pred
into a simple numerical vector. The sum of the values on the diagonal is giving us the total number of correctly predicted pictures using the neural network. This final evaluation on the TEST data can be computed also using the evaluate
function that returns the test accuracy and loss values:
%>%
network evaluate(test_images, test_labels)
8.6 Use also a validation set
We know that we should tune the number of epochs to use using a validation set. For this aim we use the first 20000 training images as validation images. We thus prepare the validation data and the new training data with 40000 pictures:
= 1:20000 #not a random sampling
valindex = train_images[valindex, ]
val_images = train_labels[valindex, ]
val_labels
= train_images[-valindex, ]
newtrain_images = train_labels[-valindex, ] newtrain_labels
We now have to reinitialize the network and redo the fitting specifying the validation data with validation_data
. We use now a larger number of epochs (20) to evaluate the training and validation loss/accuracy:
= keras_model_sequential() %>%
network layer_dense(units = 512,
activation = "relu",
input_shape = 28*28) %>%
layer_dense(units = 10,
activation = "softmax")
%>% compile(
network loss = "categorical_crossentropy",
metrics = "accuracy"
)
= network %>%
output fit(newtrain_images,
newtrain_labels,epochs = 20,
batch_size = 2^7,
validation_data = list(val_images, val_labels)) #this is the new part
We expect to see the famous “U-shape” in the validation loss and we should choose - to avoid overfitting - the number of epochs corresponding to the lowest value of the training loss. We choose 5 epochs. So we go back to the original fitting using 5 epochs as tuned value.
= keras_model_sequential() %>%
network layer_dense(units = 512,
activation = "relu",
input_shape = 28*28) %>%
layer_dense(units = 10,
activation = "softmax")
%>% compile(
network loss = "categorical_crossentropy",
metrics = "accuracy"
)
= network %>%
output fit(train_images, #the full training data
train_labels,epochs = 5, # tuned valued of epochs
batch_size = 2^7)
The final performance is given by
%>%
network evaluate(test_images, test_labels)