Chapter 9 Bayesian Inference

9.1 Using Bayes’ rule: Covid-19

For our first taste of Bayesian inference, we are going to use covid-19 quicktest data to use and understand Bayes’ rule and how to use it. As you might know, there is a lot of debate in the media about the efficacy of the antigen based test (quicktest) in identifying and preventing the spread of the virus. First let us gather all the information we have on the quicktest - we will use this paper as the source of the information (https://www.medrxiv.org/content/10.1101/2021.01.22.21250042v1) - note that we do not have the whole data.

First consider that \(C\) is the event that a patient has covid-19 (C=1) or not (C=0), and \(T\) is the event that the test is positive (T=1) or negative (T=0). From the paper, we know that the sensitivity of the test, \(P(T=1 | C=1) = 0.697\), while the specificity of the test, \(P(T=0 | C=0) = 0.995\), and finally the prevalence of covid-19, \(P(C=1) = 0.046\). Using this information, we can compute the positive and negative predictive values of the test, which is \(P(C=1 | T=1)\) and \(P(C=0 | T=0)\).

Remember Bayes’ rule \[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

Use this to compute the two values asked for above. How does the test perform? Do you think it is useful as a standalone tracker of covid in the population?

9.2 First foray into Bayesian inference

We are going to delve into Bayesian inference by using a simple dice problem. We have three identical coins, but they are not identical in terms of their probability of tossing a head. So let us call these three coins A, B and C, and their heads probabilities are \[A: P(A~gives~heads) = 0.5 \\B: P(B~gives~heads) = 0.6 \\ C: P(C~gives~heads) = 0.9\].

We are going to design a Bayesian inference method to help us identify which coin we have using the results of a coin toss experiment. You pick a coin blindfolded, and then you toss the coins a number of times, and record how many heads and tails you have. We will use this as input data for our model.

So A is a fair coin, while B and C are biased. And a priori there is no way for us to know which coin you have for your toss experiments.

Now let us start by specifying the parts of our Bayesian inference framework. Again, recall the Bayesian framework - we want to compute the posterior probabilities; in this case, that would be the probability that the coin we chose is A, B or C given the number of heads and tails observed. Let us denote the number of heads as \(H\) and number of tails as \(T\). So our data \(D\) consists of just the number of heads and the total number of tosses \(N = H+T\). What kind of model would you choose for this data? How would you write the likelihood of this data given the probability of heads, \(p\) (Remeber the Binomial distribution).

The second step: time for us to choose a prior - What prior would you choose? You have no information for the three coins - so a uniform prior on all three coins sounds like a good idea! What would a uniform prior look like?

Putting it all together

We are ready to apply Bayesian inference to identify which coin we have. Let us start by simulating some data. Choose a coin of your liking - I am choosing the one with the heads probability of 0.9.

Exercise 1. With the coin with 0.9 probability of heads, simulate \(N=10\) coin tosses. Note the number of heads as \(H\). Use this as input data. Now use Bayes’ rule together with the Uniform prior on the 3 coins to get the posterior on which coin was chosen? What does the posterior probability look like? Are you convinced about which coin was chosen?

Exercise 2. Same as exercise 1, but with \(N=100\) and \(N=1000\) coin tosses. How does the posterior change? Did your “confidence” in which coin you chose increase, and how much?

Exercise 3. Now repeat for \(N=10, 100, 1000\) coin tosses, but use the coin with heads probability of 0.6. How did your results change? Was it easier or more difficult to separate the three coin cases?

9.3 Conjugate prior, Rejection sampling and MCMC*

In this exercise, we will try and solve problem 10.1 from the Edge book. But we will not use the rejection sampler that comes with the book, we will write our own. First, let us set up the problem. We have \(n\) independent observations \(x_1, x_2, \dots, x_n\) drawn from a Normal(\(\theta, \sigma^2\)), and we assume that the variance \(\sigma^2\) is known. We will also assume the conjugate prior for the mean \(\theta\). So the prior for \(\theta\) is Normal(\(\gamma\), \(\tau^2\)). Choose whatever values of \(\theta\), \(\sigma^2\) \(\gamma\), \(\tau^2\) and \(n\) you want. Just note that if the prior is very far from the data (i.e. \(\theta\) is very different than \(\gamma\)), your rejection sampler will be very inefficient and slow. In our example, as in the book, we will use \(\theta = 2\), \(\gamma=0\), \(\sigma^2 = \tau^2 = 1\), and \(n=20\).

Exercise 4. First generate your data using rnorm using the \(\theta\) and the \(\sigma^2\) given.

Exercise 5. We know that the setup we have will lead to a posterior that has the same form as the prior (Normal). And in class and in the book, we know what the expectation and variance of this posterior is. Compute these values and use them to sample 10000 points from the posterior using rnorm. Plot the histogram of these points.

Exercise 6. In R, first install and load the library MCMCpack. Then using the function MCnormalnormal(), draw 10000 samples from the posterior distribution. Compute the mean and variance of this sample set, and plot the distribution using a histogram.

Exercise 7. Rejection sampling. We know our prior: Normal(\(\gamma\), \(\tau^2\)), and our model (likelihood): Normal(\(\theta\), \(\sigma^2\)). So we can use the recipe for a rejection sampler to write a function for rejection samplling to sample from the posterior. Keep track of your accpetance rate (what proportion of your samples were not rejected). Use your own function to draw 10000 samples from the posterior. Use these samples to compute the mean and variance, and plot their density using a histogram.

Exercise 8. Change your \(\gamma\) to 4 and run your rejection sampler again. What happened to the acceptance rate? How much slower did your sampler get? Is there any way to make this faster?

Exercise X. (BONUS) Can you write a rejection sampler for a coin toss experiment. Assume that the probability of heads of a coin is \(p\). Assume a \(Beta(\alpha=5, \beta=5)\) prior on this probability. For generating your data, use rbinom with a success probability of 0.7, and draw 1000 samples. Use both the conjugate prior and rejection sampling method to estimate the parameters of the Beta distribution that will be the posterior in this case.

9.4 Bayesian point estimates and credible intervals

In the last section, we used sampling from the posterior to understand the distribution and compute statistics such as the mean and variance of the posterior distribution. We will use the results from Exercise 5-7, to compute the point estimates and the credible interval.

9.4.1 Point estimates

Remember that there are three point estimates in a Bayesian inference.

Posterior mean: \(\theta_{mean} = E[\theta|D]\)

Posterior median: \(\theta_{med}; P(\theta < \theta_{med} | D) = 0.5\)

Posterior mode: \(\theta_{mode} = argmax~f(\theta | D)\)

Exercise 9. Using the samples from exercises 5-7, compute the three point estimates for your problem? How well did these three perform? Why are they all the same in our case? (Note: You can compute these for problem 5, above using that you would expect them to be for a Normal distribution given the parameters of the posterior Normal distribution.) What happened to the point estimates when you used a prior of exercise 8.

Exercise X. (BONUS) If you did the last bonus exercise, then compute the point estimates for the coin toss problem.

9.4.2 Credible intervals

Recall that the \((1-\alpha)\) credible interval for a parameter is any interval \((a, b)\) such that \(P(a < \theta < b | D) = 1-\alpha\).

Exercise 10. Using the samples from exercise 7 and 8, compute the 95% credible interval for our posterior. How many such intervals exist? Can you give a couple of examples of 95% credible intervals? How would you choose one?

Exercise X. (BONUS) If you did this for the beta and binomial setup in the BONUS problems, then compute the 95% credible interval for \(p\). Is the quantile based credible interval the highest posterior density interval?