2.5 Exponential
Exponential distributions are usually used to model the elapsed time between events in a Poisson process.
If \(X\) is the time to the next successful event in a Poisson process where the average rate of events is \(\lambda\), then \(X\) is a random variable with an exponential distribution \(X \sim \mathrm{Exp}(\lambda)\)
\[ f(X = x; \lambda) = \begin{cases} \lambda e^{-\lambda x}, & \mbox{if } x \ge 0 \\ 0, & \mbox{if } x < 0 \end{cases} \]
with \(E(X)=1/\lambda\) and \(Var(X) = 1/\lambda^2\).
This blog by Aerin Kim demonstrates how the exponential distribution is related to the Poisson distribution. The key is that the probability of no event in 1 time period is \(P(X = 0; \lambda) = e^{-\lambda}\frac{\lambda^0}{0!} = e^{-\lambda}\), so the probability of no events in t time periods is \(P(X>t) =e^{-\lambda t}\) because the \(t\) periods are independent. The CDF is \(P(X \le t) =1 - e^{-\lambda t}\). And the PDF is its derivative, \(P(X = t) = \lambda e^{-\lambda t}\)
Kim shows how the exponential distribution is “memory-less,” meaning the \(P(X > x_1 | X > x_0) = P(X > x_1 - x_0).\) E.g., the probability a 9-year old machine fails after 12 years is the same as the probability a 0-year old machine fails after 3 years. If that seems like a bad model, turn to something with increasing hazard rates, like Weibull. But oftentimes it is a good assumption, like the probability of a car accident (this is the context in which you might refer to \(\lambda\) as the hazard rate).
Suppose the average rate of bus arrivals is one per 15 minutes, a Poisson process. The probability less than 10 minutes elapses between buses is \(P(X \le 10) = 1 - e^{-1/15 \cdot 10} = .486.\)
data.frame(min = 0:60) %>% mutate(p = pexp(min, 1/15)) %>%
ggplot(aes(x = min)) +
geom_line(aes(y = p)) +
geom_hline(yintercept = pexp(10, 1/15), linetype = 2) +
geom_vline(xintercept = 10, linetype = 2) +
scale_y_continuous(breaks = seq(0, 1, .1)) +
scale_x_continuous(breaks = seq(0, 60, 5)) +
theme(panel.grid.minor = element_blank()) +
labs(title = "P(X<=10) = .486")
90% of buses arrive within 34.5 minutes.
data.frame(min = 0:60) %>% mutate(p = pexp(min, 1/15)) %>%
ggplot(aes(x = min)) +
geom_line(aes(y = p)) +
geom_hline(yintercept = 0.90, linetype = 2) +
geom_vline(xintercept = qexp(.90, 1/15), linetype = 2) +
scale_y_continuous(breaks = seq(0, 1, .1)) +
scale_x_continuous(breaks = seq(0, 60, 5)) +
theme(panel.grid.minor = element_blank()) +
labs(title = "P(X<=34.5) = .90")
The average time for two buses to arrive is \(2 \cdot E[X] = 2 \cdot \frac{1}{1/15} = 30\) minutes.