19 From Hypothesis Testing to Parameter Estimation

In Chapter 15, we learned how to turn a parameter estimation problem into a hypothesis test. In this chapter, we’re going to do the opposite: by looking at a virtually continuous range of possible hypotheses, we can use the Bayes factor and posterior odds (a hypothesis test) as a form of parameter estimation! This approach allows us to evaluate more than just two hypotheses and provides us with a simple framework for estimating any parameter.

19.1 Is the Carnival Game Really Fair?

N: 100, Success: 24 H1: p = 0.5 H2: p = 0.05

To get our Bayes factor, we need to compute P(D | H) for each hypothesis:

$\begin{matrix} (19.1) & \frac{P (D ∣ H_{2}) = (0.05)^{24} \times (1 - 0.05)^{76}}{P (D ∣ H_{1}) = (0.5)^{24} \times (1 - 0.5)^{76}} \end{matrix}$

Listing 19.1: Compute Bayes Factor for Duck Game

(0.5^24 * 0.5^76) / (0.05^24 * 0.95^76)

#> [1] 652.7191

Our Bayes factor tells us that $H_{1}$ , the attendant’s hypothesis, explains the data 653 times as well as $H_{2}$ .

This seems strange, because we got only 24 prices in 100 draws, which definitely is not near to 0.5.

Lets check with the pbinom() function (introduced in Chapter 13) to calculate the binomial distribution:

Listing 19.2: Compute probability of seeing 24 or fewer prizes, assuming that the probability of getting a prize is really 0.5

pbinom(24, 100, 0.5)

#> [1] 9.050013e-08

This is an extremely low value!

In the past, we’ve often found that the prior probability usually matters a lot when the Bayes factor alone doesn’t give us an answer that makes sense. … [But] there must be some problem here other than the prior.

19.1.1 Considering Multiple Hypotheses

One obvious problem is that, while it seems intuitively clear that the attendant is wrong in his hypothesis, the customer’s alternative hypothesis is just too extreme to be right, either, so we have two wrong hypotheses. What if the customer thought the probability of winning was 0.2, rather than 0.05? We’ll call this hypothesis $H_{3}$ . Testing $H_{3}$ against the attendant’s hypothesis radically changes the results of our likelihood ratio:

$\begin{matrix} (19.2) & \frac{P (D ∣ H_{3}) = (0.2)^{24} \times (1 - 0.2)^{76}}{P (D ∣ H_{1}) = (0.5)^{24} \times (1 - 0.5)^{76}} \end{matrix}$

Listing 19.3: Compute Bayes Factor for Duck Game

(0.2^24 * 0.8^76) / (0.5^24 * 0.5^76)

#> [1] 917399.4

The trouble we had in our first hypothesis test was that the customer’s belief was a far worse description of the event than the attendant’s belief.

Of course, we haven’t really solved our problem. What if there’s an even better hypothesis out there?

19.1.2 Searching for More Hypotheses with R

We’ll consider every increment of 0.01 between 0 and 1 as a possible hypothesis.

Listing 19.4: Search with increment of 0.01 for als hypothesises between 0 and 1

bayes.factor <- function(h_top, h_bottom) {
    ((h_top) ^ 24 * (1 - h_top) ^ 76) / 
        ((h_bottom) ^ 24 * (1 - h_bottom) ^ 76)
}

dx <- 0.01
hypotheses <- seq(0, 1, by = dx)
bfs <- bayes.factor(hypotheses, 0.5)
plot(hypotheses, bfs, type = 'l')

Figure 19.1: Plotting the Bayes factor for each of our hypotheses

1.478776^{6} is the largest Bayes factor in our vector bfs. 0.24 is the highest likelihood ratio, telling us which hypothesis we should believe in the most.

Caution

I want to try this calculation with appropriate R packages. Maybe {BayesFactor} could be helpful?

19.1.3 Adding Priors to Our Likelihood Ratios

“I used to make games like these, and I can tell you that for some strange industry reason, the people who design these duck games never put the prize rate between 0.2 and 0.3. I’d bet you the odds are 1,000 to 1 that the real prize rate is not in this range. Other than that, I have no clue.”

Listing 19.5: Visualize the prior that the duck games never put a prize rate between 0.2 and 0.3

priors <- ifelse(hypotheses >= 0.2 & hypotheses <= 0.3, 1/1000,1)
plot(hypotheses, priors, type = 'l')

Figure 19.2: Visualizing our prior odds ratios

Listing 19.6: Plot new posterior distribution inlcuding the prior

posteriors <- priors * bfs
plot(hypotheses, posteriors, type = 'l')

Figure 19.3: Plotting our distribution of Bayes factors

19.2 Building a Probability Distribution

The posterior odds for our hypotheses is sum(posteriors) = 3.1406875^{6}, e.g. it does not sum to 1. So we need to normalize it by dividing each value in our posteriors vector by the sum of all the values: p.posteriors <- posteriors/sum(posteriors) = . Now sum(p.posteriors) adds to 1.

Listing 19.7: Plot the normalized posterior odds

plot(hypotheses, p.posteriors, type = 'l')

Figure 19.4: Our normalized posterior odds (note the scale on the y-axis)

We can also use our p.posteriors to answer some common questions we might have about our data. For example, we can now calculate the probability that the true rate of getting a prize is less than what the attendant claims. We just add up all the probabilities for values less than 0.5:

sum(p.posteriors[which(hypotheses < 0.5)]) = 0.9999995

we can be almost certain that the attendant is overstating the true prize rate.

We can also calculate the expectation of our distribution and use this result as our estimate for the true probability. Recall that the expectation is just the sum of the estimates weighted by their value:

sum(p.posteriors * hypotheses) = 0.2402704

Of course, we can see our distribution is a bit atypical, with a big gap in the middle, so we might want to simply choose the most likely estimate, as follows:

hypotheses[which.max(p.posteriors)] = 0.19

Now we’ve used the Bayes factor to come up with a range of probabilistic estimates for the true possible rate of winning a prize in the duck game. This means that we’ve used the Bayes factor as a form of parameter estimation!

19.3 From the Bayes Factor to Parameter Estimation

As we’ve discussed many times since Chapter 5, if we want to estimate the rate of some event, we can always use the beta distribution.

Beta distribution with mode at 0.24 — Figure 19.5: The beta distribution with an alpha of 24 and a beta of 76

Except for the scale of the y-axis, the plot looks nearly identical to the original plot of our likelihood ratios! In fact, if we do a few simple tricks, we can get these two plots to line up perfectly. If we scale our beta distribution by the size of our dx and normalize our bfs, we can see that these two distributions get quite close:

Line of beta distribution with mode at 0.24 and with the normalized likelihood overlaid by points that are almost a match, but slightly to the right of the beta distribution — Figure 19.6: Our initial distribution of likelihood ratios maps pretty closely to Beta(24,76)

There seems to be only a slight difference now. We can fix it by using the weakest prior that indicates that getting a prize and not getting a prize are equally likely—that is, by adding 1 to both the alpha and beta parameters

Line of beta distribution with mode at 0.24 and with the normalized likelihood overlaid by points, this time with an exact match. — Figure 19.7: Our likelihood ratios map perfectly to a Beta(24+1,76+1) distribution

by using the Bayes factor, we’ve been able to empirically re-create a modified version of it that assumes a prior of Beta(1,1). And we did it without any fancy mathematics! All we had to do was:

Define the probability of the evidence given a hypothesis.

Consider all possible hypotheses.

Normalize these values to create a probability distribution.

Not only is the Bayes factor a great tool for setting up hypothesis tests, but, as it turns out, it’s also all we need to create any probability distribution we might want to use to solve our problem, whether that’s hypothesis testing or parameter estimation. We just need to be able to define the basic comparison between two hypotheses, and we’re on our way.

19.4 Wrapping Up

From the basic rules of probability, we can derive Bayes’ theorem, which lets us convert evidence into a statement expressing the strength of our beliefs. From Bayes’ theorem, we can derive the Bayes factor, a tool for comparing how well two hypotheses explain the data we’ve observed. By iterating through possible hypotheses and normalizing the results, we can use the Bayes factor to create a parameter estimate for an unknown value. This, in turn, allows us to perform countless other hypothesis tests by comparing our estimates. And all we need to do to unlock all this power is use the basic rules of probability to define our likelihood, $P (D ∣ H)$ !

19.5 Exercises

Try answering the following questions to see how well you understand using the Bayes factor and posterior odds to do parameter estimation. The solutions can be found at https://nostarch.com/learnbayes/.

19.5.1 Exercise 19-1

Our Bayes factor assumed that we were looking at $H_{1} : P (p r i z e) = 0.5$ . This allowed us to derive a version of the beta distribution with an alpha of 1 and a beta of 1. Would it matter if we chose a different probability for H1? Assume $H_{1} : P (p r i z e) = 0.24$ , then see if the resulting distribution, once normalized to sum to 1, is any different than the original hypothesis.

19.5.2 Exercise 19-2

Write a prior for the distribution in which each hypothesis is 1.05 times more likely than the previous hypothesis (assume our dx remains the same).

19.5.3 Exercise 19-3

Suppose you observed another duck game that included 34 ducks with prizes and 66 ducks without prizes. How would you set up a test to answer “What is the probability that you have a better chance of winning a prize in this game than in the game we used in our example?” Implementing this requires a bit more sophistication than the R used in this book, but see if you can learn this on your own to kick off your adventures in more advanced Bayesian statistics!

19.6 Experiments

# From Hypothesis Testing to Parameter Estimation {#sec-chap-19} In @sec-chap-15, we learned how to turn a parameter estimation problem into a hypothesis test. In this chapter, we’re going to do the opposite: by looking at a virtually continuous range of possible hypotheses, we can use the Bayes factor and posterior odds (a hypothesis test) as a form of parameter estimation! This approach allows us to evaluate more than just two hypotheses and provides us with a simple framework for estimating any parameter. ## Is the Carnival Game Really Fair? N: 100, Success: 24 H1: p = 0.5 H2: p = 0.05 > To get our Bayes factor, we need to compute P(D | H) for each hypothesis: $$ \frac{P(D \mid H_{2}) = (0.05)^{24} \times (1 - 0.05)^{76}}{P(D \mid H_{1}) = (0.5)^{24} \times (1 - 0.5)^{76}} $$ {#eq-bf-duck1} ```{r} #| label: comp-BF-duck1 #| attr-source: '#lst-comp-BF-duck1 lst-cap="Compute Bayes Factor for Duck Game"' (0.5^24 * 0.5^76) / (0.05^24 * 0.95^76) ``` > Our Bayes factor tells us that $H_{1}$, the attendant’s hypothesis, explains the data 653 times as well as $H_{2}$. This seems strange, because we got only 24 prices in 100 draws, which definitely is not near to 0.5. Lets check with the `pbinom()` function (introduced in @sec-chap-13) to calculate the binomial distribution: ```{r} #| label: pbinom-duck #| attr-source: '#lst-pbinom-duck lst-cap="Compute probability of seeing 24 or fewer prizes, assuming that the probability of getting a prize is really 0.5"' pbinom(24, 100, 0.5) ``` This is an extremely low value! > In the past, we’ve often found that the prior probability usually matters a lot when the Bayes factor alone doesn’t give us an answer that makes sense. … [But] there must be some problem here other than the prior. ### Considering Multiple Hypotheses > One obvious problem is that, while it seems intuitively clear that the attendant is wrong in his hypothesis, the customer’s alternative hypothesis is just too extreme to be right, either, so we have two wrong hypotheses. What if the customer thought the probability of winning was 0.2, rather than 0.05? We’ll call this hypothesis $H_{3}$. Testing $H_{3}$ against the attendant’s hypothesis radically changes the results of our likelihood ratio: $$ \frac{P(D \mid H_{3}) = (0.2)^{24} \times (1 - 0.2)^{76}}{P(D \mid H_{1}) = (0.5)^{24} \times (1 - 0.5)^{76}} $$ {#eq-bf-duck2} ```{r} #| label: comp-BF-duck2 #| attr-source: '#lst-comp-BF-duck2 lst-cap="Compute Bayes Factor for Duck Game"' (0.2^24 * 0.8^76) / (0.5^24 * 0.5^76) ``` > The trouble we had in our first hypothesis test was that the customer’s belief was a far worse description of the event than the attendant’s belief. > Of course, we haven’t really solved our problem. What if there’s an even better hypothesis out there? ### Searching for More Hypotheses with R > We’ll consider every increment of 0.01 between 0 and 1 as a possible hypothesis. ```{r} #| label: fig-19-1 #| fig-cap: "Plotting the Bayes factor for each of our hypotheses" #| attr-source: '#lst-fig-19-1 lst-cap="Search with increment of 0.01 for als hypothesises between 0 and 1"' bayes.factor <- function(h_top, h_bottom) { ((h_top) ^ 24 * (1 - h_top) ^ 76) / ((h_bottom) ^ 24 * (1 - h_bottom) ^ 76) } dx <- 0.01 hypotheses <- seq(0, 1, by = dx) bfs <- bayes.factor(hypotheses, 0.5) plot(hypotheses, bfs, type = 'l') ``` `r max(bfs)` is the largest Bayes factor in our vector `bfs`. `r hypotheses[which.max(bfs)]` is the highest likelihood ratio, telling us which hypothesis we should believe in the most. ::: {.callout-caution} I want to try this calculation with appropriate R packages. Maybe [{**BayesFactor**}](https://richarddmorey.github.io/BayesFactor/) could be helpful? ::: ### Adding Priors to Our Likelihood Ratios > “I used to make games like these, and I can tell you that for some strange industry reason, the people who design these duck games never put the prize rate between 0.2 and 0.3. I’d bet you the odds are 1,000 to 1 that the real prize rate is not in this range. Other than that, I have no clue.” ```{r} #| label: fig-19-2 #| fig-cap: "Visualizing our prior odds ratios" #| attr-source: '#lst-fig-19-2 lst-cap="Visualize the prior that the duck games never put a prize rate between 0.2 and 0.3"' #| out.width: 70% #| fig.align: center priors <- ifelse(hypotheses >= 0.2 & hypotheses <= 0.3, 1/1000,1) plot(hypotheses, priors, type = 'l') ``` ```{r} #| label: fig-19-3 #| fig-cap: "Plotting our distribution of Bayes factors" #| attr-source: '#lst-fig-19-3 lst-cap="Plot new posterior distribution inlcuding the prior"' #| out.width: 70% #| fig.align: center posteriors <- priors * bfs plot(hypotheses, posteriors, type = 'l') ``` ## Building a Probability Distribution The posterior odds for our hypotheses is `sum(posteriors)` = `r sum(posteriors)`, e.g. it does not sum to 1. So we need to normalize it by dividing each value in our `posteriors` vector by the sum of all the values: `p.posteriors <- posteriors/sum(posteriors)` = `r p.posteriors <- posteriors/sum(posteriors)`. Now `sum(p.posteriors)` adds to `r sum(p.posteriors)`. ```{r} #| label: fig-19-4 #| fig-cap: "Our normalized posterior odds (note the scale on the y-axis)" #| attr-source: '#lst-fig-19-4 lst-cap="Plot the normalized posterior odds"' #| out.width: 70% #| fig.align: center plot(hypotheses, p.posteriors, type = 'l') ``` > We can also use our p.posteriors to answer some common questions we might have about our data. For example, we can now calculate the probability that the true rate of getting a prize is less than what the attendant claims. We just add up all the probabilities for values less than 0.5: `sum(p.posteriors[which(hypotheses < 0.5)])` = `r sum(p.posteriors[which(hypotheses < 0.5)])` > we can be almost certain that the attendant is overstating the true prize rate. > We can also calculate the expectation of our distribution and use this result as our estimate for the true probability. Recall that the expectation is just the sum of the estimates weighted by their value: `sum(p.posteriors * hypotheses)` = `r sum(p.posteriors * hypotheses)` > Of course, we can see our distribution is a bit atypical, with a big gap in the middle, so we might want to simply choose the most likely estimate, as follows: `hypotheses[which.max(p.posteriors)]` = `r hypotheses[which.max(p.posteriors)]` > Now we’ve used the Bayes factor to come up with a range of probabilistic estimates for the true possible rate of winning a prize in the duck game. This means that we’ve used the Bayes factor as a form of parameter estimation! ## From the Bayes Factor to Parameter Estimation > As we’ve discussed many times since @sec-chap-05, if we want to estimate the rate of some event, we can always use the beta distribution. ![The beta distribution with an alpha of 24 and a beta of 76](img/19fig05.jpg){#fig-19-05 fig-alt="Beta distribution with mode at 0.24" fig-align="center" width="70%"} > Except for the scale of the y-axis, the plot looks nearly identical to the original plot of our likelihood ratios! In fact, if we do a few simple tricks, we can get these two plots to line up perfectly. If we scale our beta distribution by the size of our `dx` and normalize our `bfs`, we can see that these two distributions get quite close: ![Our initial distribution of likelihood ratios maps pretty closely to Beta(24,76)](img/19fig06.jpg){#fig-19-06 fig-alt="Line of beta distribution with mode at 0.24 and with the normalized likelihood overlaid by points that are almost a match, but slightly to the right of the beta distribution" fig-align="center" width="70%"} > There seems to be only a slight difference now. We can fix it by using the weakest prior that indicates that getting a prize and not getting a prize are equally likely—that is, by adding 1 to both the alpha and beta parameters ![Our likelihood ratios map perfectly to a Beta(24+1,76+1) distribution](img/19fig07.jpg){#fig-19-07 fig-alt="Line of beta distribution with mode at 0.24 and with the normalized likelihood overlaid by points, this time with an exact match." fig-align="center" width="70%"} > by using the Bayes factor, we’ve been able to empirically re-create a modified version of it that assumes a prior of Beta(1,1). And we did it without any fancy mathematics! All we had to do was: > > - Define the probability of the evidence given a hypothesis. > - Consider all possible hypotheses. > - Normalize these values to create a probability distribution. > Not only is the Bayes factor a great tool for setting up hypothesis tests, but, as it turns out, it’s also all we need to create any probability distribution we might want to use to solve our problem, whether that’s hypothesis testing or parameter estimation. We just need to be able to define the basic comparison between two hypotheses, and we’re on our way. ## Wrapping Up > From the basic rules of probability, we can derive Bayes’ theorem, which lets us convert evidence into a statement expressing the strength of our beliefs. From Bayes’ theorem, we can derive the Bayes factor, a tool for comparing how well two hypotheses explain the data we’ve observed. By iterating through possible hypotheses and normalizing the results, we can use the Bayes factor to create a parameter estimate for an unknown value. This, in turn, allows us to perform countless other hypothesis tests by comparing our estimates. And all we need to do to unlock all this power is use the basic rules of probability to define our likelihood, $P(D \mid H)$! ## Exercises Try answering the following questions to see how well you understand using the Bayes factor and posterior odds to do parameter estimation. The solutions can be found at https://nostarch.com/learnbayes/. ### Exercise 19-1 Our Bayes factor assumed that we were looking at $H_{1}: P(prize) = 0.5$. This allowed us to derive a version of the beta distribution with an alpha of 1 and a beta of 1. Would it matter if we chose a different probability for H1? Assume $H_{1}: P(prize) = 0.24$, then see if the resulting distribution, once normalized to sum to 1, is any different than the original hypothesis. ### Exercise 19-2 Write a prior for the distribution in which each hypothesis is 1.05 times more likely than the previous hypothesis (assume our `dx` remains the same). ### Exercise 19-3 Suppose you observed another duck game that included 34 ducks with prizes and 66 ducks without prizes. How would you set up a test to answer “What is the probability that you have a better chance of winning a prize in this game than in the game we used in our example?” Implementing this requires a bit more sophistication than the R used in this book, but see if you can learn this on your own to kick off your adventures in more advanced Bayesian statistics! ## Experiments