Chapter 12 Discriminant Analysis
12.1 Introduction to Discriminant Analysis
Discriminant Analysis (DA) is a set of statistical and machine-learning techniques used for classifying observations into predefined categories based on predictor variables. These methods find linear or non-linear combinations of features that best separate groups or classes.
In this chapter, we explore six commonly used DA methods: Linear Discriminant Analysis (LDA), and Partial Least Squares Discriminant Analysis (PLS-DA). We use the Iris dataset for demonstration, including model evaluation, confusion matrices, and visualizations for each method.
Discriminant analysis aims to:
Predict group membership: Classify observations into categories (e.g., Iris species).
Understand class separation: Identify features or dimensions that discriminate between groups.
12.2 Linear Discriminant Analysis (LDA)
LDA is a linear classification method that projects the data into a space where the classes are maximally separated.
12.2.1 Building the LDA Model
## Call:
## lda(Species ~ ., data = iris)
##
## Prior probabilities of groups:
## setosa versicolor virginica
## 0.3333333 0.3333333 0.3333333
##
## Group means:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa 5.006 3.428 1.462 0.246
## versicolor 5.936 2.770 4.260 1.326
## virginica 6.588 2.974 5.552 2.026
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.Length 0.8293776 0.02410215
## Sepal.Width 1.5344731 2.16452123
## Petal.Length -2.2012117 -0.93192121
## Petal.Width -2.8104603 2.83918785
##
## Proportion of trace:
## LD1 LD2
## 0.9912 0.0088
12.2.2 Predicting and Evaluating LDA
# Predict the classes
lda_predictions <- predict(lda_model, iris)$class
# Confusion matrix
table(Predicted = lda_predictions, Actual = iris$Species)
## Actual
## Predicted setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 1
## virginica 0 2 49
# Accuracy
lda_accuracy <- mean(lda_predictions == iris$Species)
print(paste("LDA Accuracy:", round(lda_accuracy * 100, 2), "%"))
## [1] "LDA Accuracy: 98 %"
12.2.3 LDA plot
library(ggplot2)
lda_values <- predict(lda_model)$x
lda_plot_data <- data.frame(lda_values, Species = iris$Species)
ggplot(lda_plot_data, aes(LD1, LD2, color = Species)) +
geom_point(size = 3) +
ggtitle("LDA: Linear Discriminant Analysis") +
theme_minimal(base_size = 14) +
theme(plot.title = element_text(size = 16, hjust = 0.5, color = "darkblue"))
12.3 Partial Least Squares Discriminant Analysis (PLS-DA)
PLS-DA reduces dimensionality and maximizes class separation by finding latent variables.
12.3.1 Building the PLS-DA Model
## Loading required package: lattice
# Fit PLS-DA model
plsda_model <- train(Species ~ ., data = iris, method = "pls", trControl = trainControl(method = "none"))
print(plsda_model)
## Partial Least Squares
##
## 150 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: None
12.3.2 Predicting and Evaluating PLS-DA
plsda_predictions <- predict(plsda_model, iris)
# Confusion matrix
table(Predicted = plsda_predictions, Actual = iris$Species)
## Actual
## Predicted setosa versicolor virginica
## setosa 50 8 0
## versicolor 0 0 0
## virginica 0 42 50
# Accuracy
plsda_accuracy <- mean(plsda_predictions == iris$Species)
print(paste("PLS-DA Accuracy:", round(plsda_accuracy * 100, 2), "%"))
## [1] "PLS-DA Accuracy: 66.67 %"
12.3.3 PLS-DA plot
library(ggplot2)
plsda_scores <- predict(plsda_model, iris, type = "raw")
ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width, color = plsda_scores)) +
geom_point(size = 3) +
ggtitle("PLS-DA: Separation by Petal Dimensions") +
theme_minimal(base_size = 14) +
theme(plot.title = element_text(size = 16, hjust = 0.5, color = "darkred"))
12.4 Summary
This chapter provided an overview of six discriminant analysis methods using the Iris dataset. Each method was evaluated using confusion matrices and visualized to demonstrate their classification capabilities. These methods are invaluable for understanding class separation and predicting categorical outcomes in real-world datasets.