6.3 Classification as A Specific Prediction

Classification is the simplest form of predictive analysis. In classification, the model is called classifier. It is built by a process called training using a training dataset. The training dataset always has the targeted values, which is also called class label. The training is to build a matching model that can map the input data to the class label. The use of the model, now called a classifier, is to predict class labels for new data. Before a classifier can be used, it needs to go through a process called testing or verification. In the testing, the test dataset is used, which the targeted value or labels are not there. The most commonly used classifiers are Decision tree, Random Forest, regression models, and Gaussian Naive Bayes.

We will use these classifiers to solve our Titanic problem. It is to predict a passenger’s death or survival based on the dataset we have. The Titanic prediction problem is even a simpler classification problem because it only has two possibilities results. This kind of classification is also called binary classification. Because we only need to classify a passenger either belongs to a survived class or perished class.