2.5 Negative-Binomial

If \(X\) is the count of failure events ocurring prior to reaching \(r\) successful events in a sequence of Bernouli trias of success probability \(p\), then \(X\) is a random variable with a negative-binomial distribution \(X \sim NB(r, p)\). The probability of \(X = x\) failures prior to \(r\) successes is

\[f(x;r, p) = {{x + r - 1} \choose {r - 1}} p^r (1-p)^{x}.\]

with \(E(X) = r (1 - p) / p\) and \(Var(X) = r (1-p) / p^2\).

When the data has overdispersion, model the data with the negative-binomial distribution instead of Poission.

Examples

An oil company has a \(p = 0.20\) chance of striking oil when drilling a well. What is the probability the company drills \(x + r = 7\) wells to strike oil \(r = 3\) times? Note that the question is formulated as counting total events, \(x + r = 7\), so translate it to total failed events, \(x = 4\).

\[f(x;r, p) = {{4 + 3 - 1} \choose {3 - 1}} (0.20)^3 (1 - 0.20)^4 = 0.049.\]

Function dnbinom() calculates the negative-binomial probability. Parameter x equals the number of failures, \(x - r\).

dnbinom(x = 4, size = 3, prob = 0.2)

## [1] 0.049

The expected number of failures prior to 3 successes is \(E(X) = 3 (1 - 0.20) / 0.20 = 12\) with variance \(Var(X) = 3 (1 - 0.20) / 0.20^2 = 60\). Confirm this with a simulation from n = 10,000 random samples using rnbinom().

my_dat <- rnbinom(n = 10000, size = 3, prob = 0.20)
mean(my_dat)

## [1] 12

var(my_dat)

## [1] 59