Chapter 12 Discriminant Analysis

12.1 Introduction to Discriminant Analysis

Discriminant Analysis (DA) is a set of statistical and machine-learning techniques used for classifying observations into predefined categories based on predictor variables. These methods find linear or non-linear combinations of features that best separate groups or classes.

In this chapter, we explore six commonly used DA methods: Linear Discriminant Analysis (LDA), and Partial Least Squares Discriminant Analysis (PLS-DA). We use the Iris dataset for demonstration, including model evaluation, confusion matrices, and visualizations for each method.

Discriminant analysis aims to:

  1. Predict group membership: Classify observations into categories (e.g., Iris species).

  2. Understand class separation: Identify features or dimensions that discriminate between groups.

12.2 Linear Discriminant Analysis (LDA)

LDA is a linear classification method that projects the data into a space where the classes are maximally separated.

12.2.1 Building the LDA Model

library(MASS)  

# Fit the LDA model  
lda_model <- lda(Species ~ ., data = iris)  
print(lda_model)  
## Call:
## lda(Species ~ ., data = iris)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3333333  0.3333333  0.3333333 
## 
## Group means:
##            Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa            5.006       3.428        1.462       0.246
## versicolor        5.936       2.770        4.260       1.326
## virginica         6.588       2.974        5.552       2.026
## 
## Coefficients of linear discriminants:
##                     LD1         LD2
## Sepal.Length  0.8293776  0.02410215
## Sepal.Width   1.5344731  2.16452123
## Petal.Length -2.2012117 -0.93192121
## Petal.Width  -2.8104603  2.83918785
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9912 0.0088

12.2.2 Predicting and Evaluating LDA

# Predict the classes  
lda_predictions <- predict(lda_model, iris)$class  

# Confusion matrix  
table(Predicted = lda_predictions, Actual = iris$Species)  
##             Actual
## Predicted    setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         1
##   virginica       0          2        49
# Accuracy  
lda_accuracy <- mean(lda_predictions == iris$Species)  
print(paste("LDA Accuracy:", round(lda_accuracy * 100, 2), "%"))  
## [1] "LDA Accuracy: 98 %"

12.2.3 LDA plot

library(ggplot2)  

lda_values <- predict(lda_model)$x  
lda_plot_data <- data.frame(lda_values, Species = iris$Species)  

ggplot(lda_plot_data, aes(LD1, LD2, color = Species)) +  
  geom_point(size = 3) +  
  ggtitle("LDA: Linear Discriminant Analysis") +  
  theme_minimal(base_size = 14) +  
  theme(plot.title = element_text(size = 16, hjust = 0.5, color = "darkblue"))  

12.3 Partial Least Squares Discriminant Analysis (PLS-DA)

PLS-DA reduces dimensionality and maximizes class separation by finding latent variables.

12.3.1 Building the PLS-DA Model

library(caret)  
## Loading required package: lattice
# Fit PLS-DA model  
plsda_model <- train(Species ~ ., data = iris, method = "pls", trControl = trainControl(method = "none"))  
print(plsda_model)  
## Partial Least Squares 
## 
## 150 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: None

12.3.2 Predicting and Evaluating PLS-DA

plsda_predictions <- predict(plsda_model, iris)  

# Confusion matrix  
table(Predicted = plsda_predictions, Actual = iris$Species)  
##             Actual
## Predicted    setosa versicolor virginica
##   setosa         50          8         0
##   versicolor      0          0         0
##   virginica       0         42        50
# Accuracy  
plsda_accuracy <- mean(plsda_predictions == iris$Species)  
print(paste("PLS-DA Accuracy:", round(plsda_accuracy * 100, 2), "%"))  
## [1] "PLS-DA Accuracy: 66.67 %"

12.3.3 PLS-DA plot

library(ggplot2)  

plsda_scores <- predict(plsda_model, iris, type = "raw")  

ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width, color = plsda_scores)) +  
  geom_point(size = 3) +  
  ggtitle("PLS-DA: Separation by Petal Dimensions") +  
  theme_minimal(base_size = 14) +  
  theme(plot.title = element_text(size = 16, hjust = 0.5, color = "darkred"))  

12.4 Summary

This chapter provided an overview of six discriminant analysis methods using the Iris dataset. Each method was evaluated using confusion matrices and visualized to demonstrate their classification capabilities. These methods are invaluable for understanding class separation and predicting categorical outcomes in real-world datasets.