4 Linear Discriminant Analysis
4.1 Introduction
Once again we focus on \(\Pr (Y=k | X = x)\) to classify an individual (or other unit) in one of the categories of \(Y\). Using the Bayes theorem,
\(\Pr (Y=k | X = x) = \frac{f_k(x) \Pr (Y=k) }{f(x)}\),
where \(f_k(x)\) is the density for \(X\mid Y=k\).
Thus, finding the category \(k\) that has the highest probability \(\Pr (Y=k | X = x)\) is the same as finding the category \(k\) with higher value for \(\frac{f_k(x)\Pr(Y=k) }{f(x)}\).
Now, if we assume that the density of \(X\) (represented above in a slightly abuse of notation as Pr(X=x)) is \(N(\mu, \sigma^2)\), it is possible to show that maximising the right side of the equation is equivalent to maximise
\(\underbrace{x \cdot \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + \log\big(\Pr(Y=k)\big)}_{\text{discriminant function}}\)
So, if we get estimates for the parameters in the discriminant function, we can calculate the category \(k\) that has the highest discriminant value and thus the highest \(\Pr (Y=k | X = x)\).
4.2 Readings
Read the following chapters of An introduction to statistical learning:
4.3 Practical session
TASK 1 - Classification with the linear discriminant function
With the bmd.csv dataset, let’s use the variable bmd to predict fracture using linear discriminant analysis.
The discriminant function is given by
\(age \cdot \frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + \log\big(\Pr(Y=k)\big)\),
where \(\mu_k\) is the mean bmd for the group \(k=\) “fracture” or \(k=\) “no fracture”, \(\sigma\) is the standard deviation for bmd, and \(\Pr(Y=k)\) is the (marginal) probability of each category of the outcome.
We can easily get estimates for these parameters.
#libraries that we will need
library(MASS) #lda function
set.seed(1974) #fix the random generator seed
#read the data
<-
bmd.data read.csv("https://www.dropbox.com/s/c6mhgatkotuze8o/bmd.csv?dl=1",
stringsAsFactors = TRUE)
#mean bmd for fracture
<- with(bmd.data, mean(bmd[fracture=="fracture"]))
mean.f #mean bmd for no fracture
<- with(bmd.data, mean(bmd[fracture=="no fracture"]))
mean.nf #estimate of sigma (see page 141)
<- sqrt(with(bmd.data,
sigma.bmd sum((bmd[fracture=="fracture"] - mean.f)^2) +
(sum((bmd[fracture=="no fracture"]- mean.nf)^2))/
length(bmd)-2)
(
)
)#probability of fracture/no fracture
<- prop.table(table(bmd.data$fracture))
pr.fracture
print(c(mean.f, mean.nf, sigma.bmd))
## [1] 0.6233080 0.8502454 0.1305394
print(pr.fracture)
##
## fracture no fracture
## 0.295858 0.704142
So, now we can compute the value of the discriminant function for a particular bmd. For example, for bmd=0.54
#for fracture
0.54*mean.f/sigma.bmd^2 - mean.f^2/(2*sigma.bmd^2) + log(pr.fracture[1])
## fracture
## 7.134559
#for no fracture
0.54*mean.nf/sigma.bmd^2 - mean.nf^2/(2*sigma.bmd^2) + log(pr.fracture[2])
## no fracture
## 5.381084
Thus, for bmd=0.54, the classification would “fracture” given that this category has the highest value for the discriminant function.
The linear discriminat analysis is implemented in the lda()
function
from the library(MASS)
library(MASS)
<- lda(fracture~bmd, data=bmd.data)
lda.model
#to classify someone with bmd=-.54
#
predict(lda.model, newdata=data.frame(bmd=0.54))$class
## [1] fracture
## Levels: fracture no fracture
TRY IT YOURSELF:
- Compute the confusion matrix for LDA using bmd to predict fracture.
See the solution code
#predictions
<- predict(lda.model)$class
pred.dataset #confusion matrix
table(bmd.data$fracture, pred.dataset)
- Additionally to bmd, use age, weight_kg and height_cm, to predict fracture using LDA, and compute the confusion matrix. Compare the kappa statistic for this result with the one obtained in 1).
See the solution code
<- lda(fracture~bmd+age+weight_kg+height_cm,
lda.model2 data=bmd.data)
#predictions
<- predict(lda.model2)$class
pred.dataset2 #confusion matrix
table(bmd.data$fracture, pred.dataset2)
library(irr) #for the kappa statistics
## Loading required package: lpSolve
kappa2(cbind(bmd.data$fracture, pred.dataset)) #model in 1)
kappa2(cbind(bmd.data$fracture, pred.dataset2)) #current model
TASK 2 - Classification with the quadratic discriminant function
The linear discriminant function assumes that the variance is the same for all the categories of the outcome. The quadratic discriminant analysis (QDA) relaxes this assumption.
Let’s repeat the classification of fracture with bmd, using a QDA
#qda() is a function from the MASS
#library that fits QDA
<- qda(fracture ~ bmd,
qda.model data=bmd.data)
We can now predict fracture for the individuals in the dataset and compare it with the observed values (confusion matrix)
#predictions
<- predict(qda.model)$class
pred.qda #confusion matrix
table(bmd.data$fracture, pred.qda)
## pred.qda
## fracture no fracture
## fracture 34 16
## no fracture 10 109
TRY IT YOURSELF:
- Additionally to bmd, use age, weight_kg and height_cm, to predict fracture using QDA, and compute the confusion matrix.
See the solution code
<- qda(fracture~bmd+age+weight_kg+height_cm,
qda.model2 data=bmd.data)
#predictions
<- predict(qda.model2)$class
pred.qda2 #confusion matrix
table(bmd.data$fracture, pred.qda2)
4.4 Exercises
The dataset bdiag.csv, included several imaging details from patients that had a biopsy to test for breast cancer.
The variable Diagnosis classifies the biopsied tissue as M = malignant or B = benign.Use LDA to predict Diagnosis using texture_mean and radius_mean.
Build the confusion matrix for the model above
Compare the results with a logistic regession
Plot the scatter plot for texture_mean and radius_mean and draw the border line for the prediction of Diagnosis based on the model in a)
Use radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, symmetry_mean, fractal_dimension_mean to classify diagnosis with LDA and QDA. Check the distribution of the predictors.