# Chapter 8 R Lab 7 - 26/05/2023

In this lecture we will learn how to implement a feed forward neural network. For running the following code you need to have Python installed on your computer (you can download the Anaconda Python distribution at https://www.anaconda.com/download). I’m sorry but I’m not able to show you here in the html file the `keras`

output because this is giving me an error when I compile the html with RMarkdown (and I haven’t find a solution yet).

The following packages are required: `tidyverse`

and `keras`

(this has to be installed first).

`library(tidyverse) `

`library(keras)`

If you have problems/errors with `keras`

it is very likely that you have to close RStudio and run the following code (this is needed only once):

```
install.packages(tensorflow)
library(tensorflow)
install_tensorflow()
```

Then try again to load the `keras`

library:

`library(keras)`

## 8.1 Tensors

A tensor is the generalization of vectors and matrices to more than 2 dimensions. A vector in `R`

is a 1D tensor and element selection is performed by using only one index:

```
set.seed(1)
= rnorm(24) %>% round(1) #24 elements
x #vector x
```

```
## [1] -0.6 0.2 -0.8 1.6 0.3 -0.8 0.5 0.7 0.6 -0.3 1.5 0.4 -0.6 -2.2 1.1 0.0 0.0
## [18] 0.9 0.8 0.6 0.9 0.8 0.1 -2.0
```

`1] x[`

`## [1] -0.6`

A matrix is a 2D-tensor. It has two axes given by rows and columns:

```
= matrix(x, nrow=6) #automatically we have 4 columns
xmat xmat
```

```
## [,1] [,2] [,3] [,4]
## [1,] -0.6 0.5 -0.6 0.8
## [2,] 0.2 0.7 -2.2 0.6
## [3,] -0.8 0.6 1.1 0.9
## [4,] 1.6 -0.3 0.0 0.8
## [5,] 0.3 1.5 0.0 0.1
## [6,] -0.8 0.4 0.9 -2.0
```

`dim(xmat)`

`## [1] 6 4`

`1,1] #element in first row, first column xmat[`

`## [1] -0.6`

A 3D tensor is like a cube of numbers. It is characterized by 3 dimensions (or axes) and is created using the `array`

function:

```
= array(x, dim=c(2,3,4))
xarray dim(xarray)
```

`## [1] 2 3 4`

`1,,] xarray[`

```
## [,1] [,2] [,3] [,4]
## [1,] -0.6 0.5 -0.6 0.8
## [2,] -0.8 0.6 1.1 0.9
## [3,] 0.3 1.5 0.0 0.1
```

Note the 2 commas in `xarrary[1,,]`

: this is a selection in the first axis and will return you a 3 by 4 matrix. It’s important to note that generally the first axis refers to the observations you have in your data.

## 8.2 Load the mnist data

The MNIST dataset is a classic in the machine-learning community. We will deal with a multiclass classification problem. We want to classify grayscale images of handwritten digits (each image is composed by 28 by 28 pixels) into their 10 categories (from 0 to 9). We have 60000 training images and 10000 test images. Each contained values between 0 (black) and 255 (white). The data are provided with the `keras`

library and can be loded as follows

`= dataset_mnist() mnist `

The object `mnist`

is a list with 2 objects:

`names(mnist$train)`

We create now 4 different objects (regressors and response variable both for the training and the test data). We start with the training regressors, i.e. the 28 by 28 gray color values for each of the 60000 pictures contained in a 3D tensor:

```
= mnist$train$x
train_images dim(train_images)
```

We can also extract for example the second training picture and plot it:

```
2, , ]
train_images[plot(as.raster(train_images[2, , ], max=255))
```

We then extract the response variable (labels) for the training picture. This will be a vector containing the number contained in each picture:

```
= mnist$train$y
train_labels head(train_labels)
```

We do the same for the test pictures:

```
= mnist$test$x
test_images dim(test_images)
= mnist$test$y
test_labels head(test_labels)
```

Before training the neural network, we reshape the features moving from a 3D tensor of dimension (60000, 28, 28) to a 2D tensor with dimension (60000, 28*28):

```
= array_reshape(train_images,
train_images dim = c(60000, 28*28))
= array_reshape(test_images,
test_images dim = c(10000, 28*28))
```

Moreover, we transform the values so that instead of being defined in \([0,255]\) they will be defined in the set \([0,1]\):

```
= train_images / 255
train_images = test_images / 255 test_images
```

Finally we transform the two vectors (1D tensor) containing the response variable into a 2D tensor using the **one-hot enconding approach**:

```
head(train_labels) #1D
= to_categorical(train_labels)
train_labels head(train_labels) #2D now, number of columns given by the number of categories
= to_categorical(test_labels) test_labels
```

## 8.3 Set the network

We will implement a feedforward neural network with one hidden layers (with 512 units and the Rectified Linear Unit (ReLU) activation function). Given that it we have a multiclass problem the output layer will be composed by 10 units and we will use the softmax activation function:

```
= keras_model_sequential() %>%
network layer_dense(units = 512,
activation = "relu",
input_shape = 28*28) %>%
layer_dense(units = 10,
activation = "softmax")
network
```

Note that the layers are created using `layer_dense`

and only for the first one with have to specify the dimensions of the input for each observation (in this case the 28 by 28 values).

Then we have to specify something more for the network: the loss function used by the gradient descent algorithm for updating the weights (cross entropy in this case) and the metrics that will be used for evaluating the final performance (accuracy in this case). Note that in the following code we don’t have to use `network = network %>% ...`

; the object `network`

will be automatically updated:

```
%>% compile(
network loss = "categorical_crossentropy",
metrics = "accuracy"
)
```

If you type `network`

you will see how many parameters has to be estimated.

## 8.4 Fitting the neural network

It’s now time to train the neural network. We decide in particular to use 10 epochs and for each epoch to use batches of size \(2^7=128\). We function for training the network is named `fit`

and requires the specification of the training data:

```
= network %>%
output fit(train_images,
train_labels,epochs = 10,
batch_size = 2^7)
```

During the training you will see in a dynamic graph the evolution of the TRAINING loss and accuracy across the epochs.

## 8.5 Predictions

After the training has completed we can compute the predictions (i.e. the predicted value) for the observations in the test set. Let’s consider for example the first 3 test pictures. The following code is returning

`%>% predict(test_images[3,]) network `

the predicted 10 probabilities for each of the 10 classes (the output will be a 3 by 10 matrix). We are interested in the most likely category that can automatically by obtained with

`%>% predict(test_images[3,]) %>% k_argmax() network `

The final vector of predictions, for all the test pictures, is given by

```
= network %>%
pred predict(test_images) %>%
k_argmax()
length(pred)
class(pred)
```

and the confusion matrix can by obtained as usually:

`table(pred = as.numeric(pred), obs:mnist$test$y)`

The function `as.numeric`

is used to transform the keras object `pred`

into a simple numerical vector. The sum of the values on the diagonal is giving us the total number of correctly predicted pictures using the neural network. This final evaluation on the TEST data can be computed also using the `evaluate`

function that returns the test accuracy and loss values:

```
%>%
network evaluate(test_images, test_labels)
```

## 8.6 Use also a validation set

We know that we should tune the number of epochs to use using a validation set. For this aim we use the first 20000 training images as validation images. We thus prepare the validation data and the new training data with 40000 pictures:

```
= 1:20000 #not a random sampling
valindex = train_images[valindex, ]
val_images = train_labels[valindex, ]
val_labels
= train_images[-valindex, ]
newtrain_images = train_labels[-valindex, ] newtrain_labels
```

We now have to reinitialize the network and redo the fitting specifying the validation data with `validation_data`

. We use now a larger number of epochs (20) to evaluate the training and validation loss/accuracy:

```
= keras_model_sequential() %>%
network layer_dense(units = 512,
activation = "relu",
input_shape = 28*28) %>%
layer_dense(units = 10,
activation = "softmax")
%>% compile(
network loss = "categorical_crossentropy",
metrics = "accuracy"
)
= network %>%
output fit(newtrain_images,
newtrain_labels,epochs = 20,
batch_size = 2^7,
validation_data = list(val_images, val_labels)) #this is the new part
```

We expect to see the famous “U-shape” in the validation loss and we should choose - to avoid overfitting - the number of epochs corresponding to the lowest value of the training loss. We choose 5 epochs. So we go back to the original fitting using 5 epochs as tuned value.

```
= keras_model_sequential() %>%
network layer_dense(units = 512,
activation = "relu",
input_shape = 28*28) %>%
layer_dense(units = 10,
activation = "softmax")
%>% compile(
network loss = "categorical_crossentropy",
metrics = "accuracy"
)
= network %>%
output fit(train_images, #the full training data
train_labels,epochs = 5, # tuned valued of epochs
batch_size = 2^7)
```

The final performance is given by

```
%>%
network evaluate(test_images, test_labels)
```