4 Linear Discriminant Analysis

4.1 Introduction

Once again we focus on \(\Pr (Y=k | X = x)\) to classify an individual (or other unit) in one of the categories of \(Y\). Using the Bayes theorem,

\(\Pr (Y=k | X = x) = \frac{f_k(x) \Pr (Y=k) }{f(x)}\),

where \(f_k(x)\) is the density for \(X\mid Y=k\).

Thus, finding the category \(k\) that has the highest probability \(\Pr (Y=k | X = x)\) is the same as finding the category \(k\) with higher value for \(\frac{f_k(x)\Pr(Y=k) }{f(x)}\).

Now, if we assume that the density of \(X\) (represented above in a slightly abuse of notation as Pr(X=x)) is \(N(\mu, \sigma^2)\), it is possible to show that maximising the right side of the equation is equivalent to maximise

\(\underbrace{x \cdot \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + \log\big(\Pr(Y=k)\big)}_{\text{discriminant function}}\)

So, if we get estimates for the parameters in the discriminant function, we can calculate the category \(k\) that has the highest discriminant value and thus the highest \(\Pr (Y=k | X = x)\).

4.2 Readings

Read the following chapters of An introduction to statistical learning:

4.3 Practical session

TASK 1 - Classification with the linear discriminant function

With the bmd.csv dataset, let’s use the variable bmd to predict fracture using linear discriminant analysis.

The discriminant function is given by

\(age \cdot \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + \log\big(\Pr(Y=k)\big)\),

where \(\mu_k\) is the mean bmd for the group \(k=\) “fracture” or \(k=\) “no fracture”, \(\sigma\) is the standard deviation for bmd, and \(\Pr(Y=k)\) is the (marginal) probability of each category of the outcome.

We can easily get estimates for these parameters.

#libraries that we will need
library(MASS) #lda function
set.seed(1974) #fix the random generator seed 

#read the data
bmd.data     <- 
           stringsAsFactors = TRUE)
#mean bmd for fracture
mean.f   <- with(bmd.data, mean(bmd[fracture=="fracture"]))  
#mean bmd for no fracture
mean.nf  <- with(bmd.data, mean(bmd[fracture=="no fracture"])) 
#estimate of sigma (see page 141)
sigma.bmd <- sqrt(with(bmd.data,
                       (sum((bmd[fracture=="fracture"] - mean.f)^2) +
                       sum((bmd[fracture=="no fracture"]- mean.nf)^2))/
#probability of fracture/no fracture
pr.fracture <- prop.table(table(bmd.data$fracture))
print(c(mean.f, mean.nf, sigma.bmd))
## [1] 0.6233080 0.8502454 0.1305394
##    fracture no fracture 
##    0.295858    0.704142

So, now we can compute the value of the discriminant function for a particular bmd. For example, for bmd=0.54

#for fracture
0.54*mean.f/sigma.bmd^2 - mean.f^2/(2*sigma.bmd^2) + log(pr.fracture[1])      
## fracture 
## 7.134559
#for no fracture
0.54*mean.nf/sigma.bmd^2 - mean.nf^2/(2*sigma.bmd^2) + log(pr.fracture[2])      
## no fracture 
##    5.381084

Thus, for bmd=0.54, the classification would “fracture” given that this category has the highest value for the discriminant function.

The linear discriminat analysis is implemented in the lda() function from the library(MASS)

lda.model <- lda(fracture~bmd, data=bmd.data)

#to classify someone with bmd=-.54 
predict(lda.model, newdata=data.frame(bmd=0.54))$class
## [1] fracture
## Levels: fracture no fracture


  1. Compute the confusion matrix for LDA using bmd to predict fracture.
See the solution code
pred.dataset <- predict(lda.model)$class
#confusion matrix
table(bmd.data$fracture, pred.dataset)

  1. Additionally to bmd, use age, weight_kg and height_cm, to predict fracture using LDA, and compute the confusion matrix. Compare the kappa statistic for this result with the one obtained in 1).
See the solution code
lda.model2 <- lda(fracture~bmd+age+weight_kg+height_cm, 
pred.dataset2 <- predict(lda.model2)$class
#confusion matrix
table(bmd.data$fracture, pred.dataset2)

library(irr) #for the kappa statistics
## Loading required package: lpSolve
kappa2(cbind(bmd.data$fracture, pred.dataset))    #model in 1)
kappa2(cbind(bmd.data$fracture, pred.dataset2))   #current model

TASK 2 - Classification with the quadratic discriminant function

The linear discriminant function assumes that the variance is the same for all the categories of the outcome. The quadratic discriminant analysis (QDA) relaxes this assumption.

Let’s repeat the classification of fracture with bmd, using a QDA

#qda() is a function from the MASS
#library that fits QDA
qda.model <- qda(fracture ~ bmd, 

We can now predict fracture for the individuals in the dataset and compare it with the observed values (confusion matrix)

pred.qda <- predict(qda.model)$class
#confusion matrix
table(bmd.data$fracture, pred.qda)
##              pred.qda
##               fracture no fracture
##   fracture          34          16
##   no fracture       10         109


  1. Additionally to bmd, use age, weight_kg and height_cm, to predict fracture using QDA, and compute the confusion matrix.
See the solution code
qda.model2 <- qda(fracture~bmd+age+weight_kg+height_cm, 
pred.qda2 <- predict(qda.model2)$class
#confusion matrix
table(bmd.data$fracture, pred.qda2)

4.4 Exercises

  1. The dataset bdiag.csv, included several imaging details from patients that had a biopsy to test for breast cancer.
    The variable Diagnosis classifies the biopsied tissue as M = malignant or B = benign.

    1. Use LDA to predict Diagnosis using texture_mean and radius_mean.

    2. Build the confusion matrix for the model above

    3. Compare the results with a logistic regession

    4. Plot the scatter plot for texture_mean and radius_mean and draw the border line for the prediction of Diagnosis based on the model in a)

    5. Use radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, symmetry_mean, fractal_dimension_mean to classify diagnosis with LDA and QDA. Check the distribution of the predictors.

  2. Exercise 5 from the book (page 169)