We have seen how to fit Bayesian regression models for predicting numerical variables. Now we will introduce a Bayesian classification method for predicting categorical variables. The goal is to “classify” observations according to which category (class) they belong to.
There exist multiple penguin species throughout Antarctica, including the Adelie (A), Chinstrap (C), and Gentoo (G). Our goal will be to classify penguins according to their species given characteristics like weight or bill length. (Throughout “penguin” refers to an Antarctic penguin of one of these three species.)
Example 26.1 Suppose first that we have no information about a randomly selected penguin (other than it’s an Antarctic penguin from one of these three species).
How might we formulate a prior distribution for the species of the penguin? Would we necessarily assume equal prior probability for the three species?
Suppose that among Antarctic penguins of these three species, 44.2% of Adelie, 19.8% are Chinstrap, and 36.0% are Gentoo. Start to set up a Bayes table.
Example 26.2 Now suppose we know that a randomly selected penguin is below average weight (that is, below 4200g).
What is the next column of the Bayes table to fill in? What information would we need to do so?
Suppose that 83.4% of Adelie penguins are below average weight, 89.7% of Chinstrap penguins are below average weight, and 4.9% of Gentoo penguins are below average weight. Among these three species, the likelihood of being below average weight is greatest for Chinstrap. Does that mean we should necessarily classify the penguin to be Chinstrap? Why?
Complete the Bayes table and find the posterior probability of each species given that the penguin is below average weight.
If the species is below average weight, what species would you classify it as? Why?
Now suppose the the species is not below average weight. Compute the posterior probability of each species given that the penguin is not below average weight. What species would you classify it as? Why?
Suppose that you classify any randomly selected penguin based on whether or not it is below average weight. What is the posterior probability that you classify the penguin correctly?
Example 26.3 Now suppose we know that a randomly selected penguin has a bill length of 50mm. We’ll start by assuming we don’t know yet whether or not it is below average weight.
Start to create a Bayes table. What is the prior? What is the next column of the Bayes table to fill in? What information would we need to do so?
Suppose that bill lengths (mm) follow a N(38.8, 2.66) distribution for Adelie, a N(48.8, 3.34) distribution for Chinstrap, and a N(47.5, 3.08) distribution for Gentoo. How would you fill in the likelihood column?
Among these three species, the likelihood of having of a 50mm bill is greatest for Chinstrap. Does that mean we should necessarily classify the penguin to be Chinstrap? Why?
Complete the Bayes table and find the posterior probability of each species given that the penguin has a 50mm bill.
If the species has a 50mm bill, what species would you classify it as? Why?
Now suppose before measuring bill length we had already known that the penguin was below average weight. How would our Bayes table have changed? If we don’t change the likelihood column, what are we assuming?
Compute the posterior probability of each species given that the penguin is below average weight with a 50mm bill, assuming conditional independence between weight and bill length. Given a species with these characteristics, what species would you classify it as? Why?
Naive Bayes classification uses Bayes rule to predict the class of a categorical response variable \(y\) given predictor variables \(x_1, \ldots, x_p\)
Predictor variables can be categorical or numerical
Naive Bayes classification assumes that predictors are conditionally independent given each class
Naive Bayes classification typically assumes that any numerical predictor variable is conditionally Normal
Given observed data
Prior probabilities for the response variable are estimated to be the observed sample proportion for each class
For a categorical predictor \(x\), the likelihood of value \(\tilde{x}\) is estimated to be the observed sample proportion of \(\tilde{x}\) within each class
For a numerical predictor \(x\), the likelihood is computed assuming that within each class values of the predictor follow a Normal distribution with mean and SD estimated to be the sample mean and SD within each class.
Example 26.4 Now suppose we know that a penguin has a bill length of 50mm and a flipper length of 195mm. (We’ll ignore whether or not it is below average weight for this example.)
Starting with our original prior, what is the “evidence” that the likelihood is based on? What information do we need to fill in the likelihood column?
Suppose that flipper lengths (mm) follow a N(190, 6.54) distribution for Adelie, a N(196, 7.13) distribution for Chinstrap, and a N(217, 6.48) distribution for Gentoo. (Recall that bill lengths (mm) follow a N(38.8, 2.66) distribution for Adelie, a N(48.8, 3.34) distribution for Chinstrap, and a N(47.5, 3.08) distribution for Gentoo.) How would you fill in the likelihood column? What are we assuming?
Complete the Bayes table and find the posterior probability of each species given that the penguin has a 50mm bill and a 195mm flipper.
If the species has a 50mm bill and a 195mm flipper, what species would you classify it as? Why?
26.1 Notes
26.1.1 Given below average weight
class =c("A", "C", "G")prior =c(0.442, 0.198, 0.360)# likelihood of below average weight (evidence) given each species (class)likelihood =c(0.834, 0.897, 0.049) product = prior * likelihoodposterior = product /sum(product)posterior_given_below_average_weight = posteriorbayes_table =data.frame(class, prior, likelihood, product, posterior)bayes_table |>adorn_totals("row") |>kbl(digits =4) |>kable_styling()
class
prior
likelihood
product
posterior
A
0.442
0.834
0.3686
0.6537
C
0.198
0.897
0.1776
0.3150
G
0.360
0.049
0.0176
0.0313
Total
1.000
1.780
0.5639
1.0000
26.1.2 Given not below average weight
class =c("A", "C", "G")prior =c(0.442, 0.198, 0.360)# likelihood of not below average weight (evidence) given each species (class)likelihood =c(1-0.834, 1-0.897, 1-0.049) product = prior * likelihoodposterior = product /sum(product)bayes_table =data.frame(class, prior, likelihood, product, posterior)bayes_table |>adorn_totals("row") |>kbl(digits =4) |>kable_styling()
class
prior
likelihood
product
posterior
A
0.442
0.166
0.0734
0.1682
C
0.198
0.103
0.0204
0.0468
G
0.360
0.951
0.3424
0.7850
Total
1.000
1.220
0.4361
1.0000
26.1.3 Given 50mm bill
class =c("A", "C", "G")prior =c(0.442, 0.198, 0.360)# likelihood of 50mm bill (evidence) given each species (class)likelihood =c(dnorm(50, 38.8, 2.66),dnorm(50, 48.8, 3.34),dnorm(50, 47.5, 3.08))product = prior * likelihoodposterior = product /sum(product)bayes_table =data.frame(class, prior, likelihood, product, posterior)bayes_table |>adorn_totals("row") |>kbl(digits =4) |>kable_styling()
class
prior
likelihood
product
posterior
A
0.442
0.0000
0.0000
0.0002
C
0.198
0.1120
0.0222
0.3979
G
0.360
0.0932
0.0335
0.6019
Total
1.000
0.2052
0.0557
1.0000
26.1.4 Given 50mm bill and below average weight
class =c("A", "C", "G")prior = posterior_given_below_average_weight# likelihood of 50mm (evidence) given each species (class)likelihood =c(dnorm(50, 38.8, 2.66),dnorm(50, 48.8, 3.34),dnorm(50, 47.5, 3.08))product = prior * likelihoodposterior = product /sum(product)bayes_table =data.frame(class, prior, likelihood, product, posterior)bayes_table |>adorn_totals("row") |>kbl(digits =4) |>kable_styling()
class
prior
likelihood
product
posterior
A
0.6537
0.0000
0.0000
0.0004
C
0.3150
0.1120
0.0353
0.9233
G
0.0313
0.0932
0.0029
0.0763
Total
1.0000
0.2052
0.0382
1.0000
26.1.5 Given 50mm bill and 195mm flipper
class =c("A", "C", "G")prior =c(0.442, 0.198, 0.360)# likelihood of 50mm bill (evidence) given each species (class)likelihood_bill =c(dnorm(50, 38.8, 2.66),dnorm(50, 48.8, 3.34),dnorm(50, 47.5, 3.08))# likelihood of 195mm flipper (evidence) given each species (class)likelihood_flipper =c(dnorm(195, 190, 6.54),dnorm(195, 196, 7.13),dnorm(195, 217, 6.48))# assume conditional independence of bill and flipper given species (class)likelihood = likelihood_bill * likelihood_flipperproduct = prior * likelihoodposterior = product /sum(product)bayes_table =data.frame(class, prior, likelihood_bill, likelihood_flipper, likelihood, product, posterior)bayes_table |>adorn_totals("row") |>kbl(digits =6) |>kable_styling()