4 Linear Discriminant Analysis
4.1 Introduction
Once again we focus on \(\Pr (Y=k | X = x)\) to classify an individual (or other unit) in one of the categories of \(Y\). Using the Bayes theorem,
\(\Pr (Y=k | X = x) = \frac{f_k(x) \Pr (Y=k) }{f(x)}\),
where \(f_k(x)\) is the density for \(X\mid Y=k\).
Thus, finding the category \(k\) that has the highest probability \(\Pr (Y=k | X = x)\) is the same as finding the category \(k\) with higher value for \(\frac{f_k(x)\Pr(Y=k) }{f(x)}\).
Now, if we assume that the density of \(X\) (represented above in a slight abuse of notation as Pr(X=x)) is \(N(\mu, \sigma^2)\), it is possible to show that maximising the right side of the equation is equivalent to maximising
\(\underbrace{x \cdot \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + \log\big(\Pr(Y=k)\big)}_{\text{discriminant function}}\)
So, if we get estimates for the parameters in the discriminant function, we can calculate the category \(k\) that has the highest discriminant value and thus the highest \(\Pr (Y=k | X = x)\).
4.2 Readings
Read the following chapters of An introduction to statistical learning:
- 4.4 Linear Discriminant Analysis
- 4.5 A Comparison of Classification Methods
4.3 Practical session
Task 1 - Classification with the linear discriminant function
With the bmd.csv dataset, let’s use the variable bmd to predict fracture using linear discriminant analysis.
The discriminant function is given by
\(age \cdot \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + \log\big(\Pr(Y=k)\big)\),
where \(\mu_k\) is the mean bmd for the group \(k=\) “fracture” or \(k=\) “no fracture”, \(\sigma\) is the standard deviation for bmd, and \(\Pr(Y=k)\) is the (marginal) probability of each category of the outcome.
We can easily get estimates for these parameters.
#libraries that we will need
library(MASS) #lda function
set.seed(1974) #fix the random generator seed
#read the data
bmd.data <-
read.csv("https://www.dropbox.com/s/c6mhgatkotuze8o/bmd.csv?dl=1",
stringsAsFactors = TRUE)
#mean bmd for fracture
mean.f <- with(bmd.data, mean(bmd[fracture=="fracture"]))
#mean bmd for no fracture
mean.nf <- with(bmd.data, mean(bmd[fracture=="no fracture"]))
#estimate of sigma (see page 141)
sigma.bmd <- sqrt(with(bmd.data,
(sum((bmd[fracture=="fracture"] - mean.f)^2) +
sum((bmd[fracture=="no fracture"]- mean.nf)^2))/
(length(bmd)-2)
)
)
#probability of fracture/no fracture
pr.fracture <- prop.table(table(bmd.data$fracture))
print(c(mean.f, mean.nf, sigma.bmd))
## [1] 0.6233080 0.8502454 0.1305394
##
## fracture no fracture
## 0.295858 0.704142
So, now we can compute the value of the discriminant function for a particular bmd. For example, for bmd=0.54
## fracture
## 7.134559
## no fracture
## 5.381084
Thus, for bmd=0.54, the classification would “fracture” given that this category has the highest value for the discriminant function.
The linear discriminat analysis is implemented in the lda()
function
from the library(MASS)
library(MASS)
lda.model <- lda(fracture~bmd, data=bmd.data)
#to classify someone with bmd=-.54
#
predict(lda.model, newdata=data.frame(bmd=0.54))$class
## [1] fracture
## Levels: fracture no fracture
TRY IT YOURSELF:
- Compute the confusion matrix for LDA using bmd to predict fracture.
See the solution code
- Additionally to bmd, use age, weight_kg and height_cm, to predict fracture using LDA, and compute the confusion matrix. Compare the kappa statistic for this result with the one obtained in 1).
Task 2 - Classification with the quadratic discriminant function
The linear discriminant function assumes that the variance is the same for all the categories of the outcome. The quadratic discriminant analysis (QDA) relaxes this assumption.
Let’s repeat the classification of fracture with bmd, using a QDA
#qda() is a function from the MASS
#library that fits QDA
qda.model <- qda(fracture ~ bmd,
data=bmd.data)
We can now predict fracture for the individuals in the dataset and compare it with the observed values (confusion matrix)
#predictions
pred.qda <- predict(qda.model)$class
#confusion matrix
table(bmd.data$fracture, pred.qda)
## pred.qda
## fracture no fracture
## fracture 34 16
## no fracture 10 109
TRY IT YOURSELF:
- Additionally to bmd, use age, weight_kg and height_cm, to predict fracture using QDA, and compute the confusion matrix.
4.4 Exercises
The dataset bdiag.csv, included several imaging details from patients that had a biopsy to test for breast cancer.
The variable Diagnosis classifies the biopsied tissue as M = malignant or B = benign.Use LDA to predict Diagnosis using texture_mean and radius_mean.
Build the confusion matrix for the model above
Compare the results with a logistic regession
Plot the scatter plot for texture_mean and radius_mean and draw the border line for the prediction of Diagnosis based on the model in a)
Use radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, symmetry_mean, fractal_dimension_mean to classify diagnosis with LDA and QDA. Check the distribution of the predictors.
Exercise 5 from the book (page 191)