Chapter 1 Procedure followed

The procedure followed to carry out the work in question has consisted of the following phases.

1.- Preprocessed: carried out in Part I: Statistical tools, the database has been left without missing values and only with the relevant variables to be able to carry out the regression in question.

2.- Training control: The training control has been carried out by means of a cross validation of 5 folds repeated 3 times, so that the hyperparameters of each model can be optimized and, likewise, ensure a minimum robustness in the results obtained. Likewise, an iterative process of two GridSearch has been followed. The first one is broader in order to detect the order of the parameters in which the local optimal values are, and a second mesh of hyperparameters with greater precision to ensure that optimization. It should be noted that the selected metric to optimize in this case has been the square root of the mean square error (RMSE), a very common metric in regression problems.

3.- Visualizations and analysis: In all cases, the quadratic root of the mean squared error has been computed both in the training set and in the test set. Likewise, the optimization of the hyperparameters carried out, the distribution of the values of new real cases versus those predicted in the training, and the differences between the observations of the test set and those predicted by the models are shown in all the models. It is worth mentioning that, for illustrative purposes, we have wanted to show some graphs of the importance of the variables, which also help us to analyze the results obtained.

Finally, for the assembly model, the prediction intervals have been computed from the Monte-Carlo method, and the final conclusions have been made where the results obtained in all the models are compared.