7.2 Attributes Analysis

To obtain the attributes’ prediction power and the correlation among them, the basic analytic tasks need to be performed. These analytic tasks include Correlation Analysis, Principal component analysis (PCA), and Possibly factor analysis (FA).

• Correlation Analysis. Analysis correlation among the attributes, and ordering them based on the correlation of attributes with the dependent attribute. Select an appropriate number of the attributes from the highest value towards the lowest value of correlation.

• Principal component analysis (PCA). PCA is a dimension reduction method by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data’s variation as possible. PCA only works on numerical variables.

• Possibly factor analysis (FA). Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables.

There are other similar tasks such as MCA, FAMD, CA, and MFA. MCA stands for multiple correspondence analysis. It can only apply to categorical variables; FAMD stands for factor analysis of mixed data. It can apply to both numerical and categorical variables; CA is correspondence analysis, it can only work on two variables (contingency table); MFA is multiple factor analysis, it is needed only when you have variables set by the group. These tasks are all species of the PCA.

In this chapter, we will demonstrate the basic Correlation analysis and principal component analysis to understand the relationship among attributes and between the predictor and the dependent variable. We will continue to use the Titanic example.