Chapter 1 Introduction

This work was completed as part of our course “Project in Data Analytics for Decision Making.” We produced a detailed analysis of the german database. The ultimate goal was to predict whether the customer represents a risk for the bank or not by following the CRISP-DM method.

  • Business Understanding: A company that lends money takes risks. This assessment has always been made by humans. However, it is important to provide reliable tools for decision-making. Indeed, the consequences of a large number of clients who do not repay their loans would be dangerous. We are therefore going to build models in order to improve the decisions of bank employees.

  • Data understanding: In this study, we will use the german database. We will analyse the available variables and correct possible errors in the exploratory data analysis (EDA) part. We will then explore this data to find patterns that may explain why a client is more at risk. This step will help us to build our models.

  • Data preparation: We will clean up the data in the EDA part and create new variables in the modeling part to consolidate our analysis.

  • Modeling: For this part, we will try different models learned during the course. They will be built using the caret package. Therefore, we will try different tuning parameters in order to improve the models and thus the accuracy.

  • Evaluation: The assessmentt of the models will be in two points. First, a more classical method based on the accuracy and the ROC metrics. Then, we will use the DALEX method. This will allow us to compare models and analyse the impact of important variables on the predictions.

  • Deployment: Finally, we will think about how to use our results in a bank.