6.4 Poisson distributions
Example 6.15 Let be the number of home runs hit (in total by both teams) in a randomly selected Major League Baseball game.
- In what ways is this like the Binomial situation? (What is a trial? What is “success”?)
- In what ways is this NOT like the Binomial situation?
Solution. to Example 6.15
Show/hide solution
- Each pitch is a trial, and on each trial either a home run is hit (“success”) or not. The random variable counts the number of home runs (successes) over all the trials
- Even though is counting successes, this is not the Binomial situation.
- The number of trials is not fixed. The total number of pitches varies from game to game. (The average is around 300 pitches per game).
- The probability of success is not the same on each trial. Different batters have different probabilities of hitting home runs. Also, different pitch counts or game situations lead to different probabilities of home runs.
- The trials might not be independent, though this is a little more questionable. Make sure you distinguish independence from the previous assumption of unequal probabilities of success; you need to consider conditional probabilities to assess independence. Maybe if a pitcher gives up a home run on one pitch, then the pitcher is “rattled” so the probability that he also gives up a home run on the next pitch increases, or the pitcher gets pulled for a new pitcher which changes the probability of a home run on the next pitch.
Example 6.16 Let be the number of automobiles that get in accidents on Highway 101 in San Luis Obispo on a randomly selected day.
In what ways is this like the Binomial situation? (What is a trial? What is “success”?)
In what ways is this NOT like the Binomial situation?
Which of the following do you think it would be easier to estimate by collecting and analyzing relevant data?
- The total number of cars on the highway each day, and the probability that each driver on the highway has an accident.
- The average number of accidents per day that happen on the highway.
Solution. to Example 6.16
Show/hide solution
- Each automobile on the road in the day is a trial, and each automobile either gets in an accident (“success”) or not. The random variable counts the number of automobiles that get into accidents (successes). (Remember “success” is just a generic label for the event you’re interested in; “success” is not necessarily good.)
- Even though is counting successes, this is not the Binomial situation.
- The number of trials is not fixed. The total number of automobiles on the road varies from day to day.
- The probability of success is not the same on each trial. Different drivers have different probabilities of getting into accidents; some drivers are safer than others. Also, different conditions increase the probability of an accident, like driving at night.
- The trials are plausibly not independent. Make sure you distinguish independence from the previous assumption of unequal probabilities of success; you need to consider conditional probabilities to assess independence. If an automobile gets into an accident, then the probability of getting into an accident increases for the automobiles that are driving near it.
- It would be very difficult to estimate the probability that each individual driver gets into an accident. (Though you probably could measure the total number of cars.) It would be much easier to find data on total number of accidents that happen each day over some period of time, e.g., from police reports, and use it to estimate the average number of accidents per day.
The Binomial model has several restrictive assumptions that might not be satisfied in practice
- The number of trials must be fixed (not random) and known.
- The probability of success must be the same for each trial (fixed, not random) and known.
- The trials must be independent.
Even when the trials are independent with the same probability of success, fitting a Binomial model to data requires estimation of both and individually, rather than just the mean . When the only data available are success counts (e.g., number of accidents per day for a sample of days) can be estimated but and individually cannot.
Poisson models are more flexible models for counts. Poisson models are parameterized by a single parameter (the mean) and do not require all the assumptions of a Binomial model. Poisson distributions are often used to model the distribution of random variables that count the number of “relatively rare” events that occur over a certain interval of time in a certain region (e.g., number of accidents on a highway in a day, number of car insurance policies that have claims in a week, number of bank loans that go into default, number of mutations in a DNA sequence, number of earthquakes that occurs in SoCal in an hour, etc.)
Definition 6.5 A discrete random variable has a Poisson distribution with parameter132 if its probability mass function satisfies If has a Poisson() distribution then
The shape of a Poisson pmf as a function of is given by . The constant simply renormalizes the heights of the pmf so that the probabilities sum to 1. Recall the Taylor series expansion: .
For a Poisson distribution, both the mean and variance are equal to , but remember that the mean is measured in the count units (e.g., home runs) but the variance is measured in squared units (e.g., ).
0.3).plot()
Poisson(1).plot()
Poisson(2).plot()
Poisson('Poisson(0.3)', 'Poisson(1)', 'Poisson(2)']);
plt.legend([ plt.show()
Example 6.17 Suppose that the number of typographical errors on a randomly selected page of a textbook has a Poisson distribution with parameter .
- Find the probability that a randomly selected page has no typographical errors.
- Find the probability that a randomly selected page has exactly one typographical error.
- Find the probability that a randomly selected page has exactly two typographical errors.
- Find the probability that a randomly selected page has at least three typographical errors.
- Provide a long run interpretation of the parameter .
- Suppose that each page in the book contains exactly 2000 characters and that the probability that any single character is a typo is 0.00015, independently of all other characters. Let be the number of characters on a randomly selected page that are typos. Identify the distribution of and its expected value and variance, and compare to a Poisson(0.3) distribution.
Solution. to Example 6.17
Show/hide solution
Let be the number of typos. Then the pmf of is
Find the probability that a randomly selected page has no typographical errors.
About 74.1% of pages have no typos.
Find the probability that a randomly selected page has exactly one typographical error.
About 22.2% of pages have exactly 1 typo.
Find the probability that a randomly selected page has exactly two typographical errors.
About 3.3% of pages have exactly 2 typos.
Find the probability that a randomly selected page has at least three typographical errors.
About 0.36% of pages have at least 3 typos.
Provide a long run interpretation of the parameter .
There are 0.3 typos per page on average.
Suppose that each page in the book contains exactly 2000 characters and that the probability that any single character is a typo is 0.00015, independently of all other characters. Let be the number of characters on a randomly selected page that are typos. Identify the distribution of and its expected value and variance, and compare to a Poisson(0.3) distribution.
In this case has a Binomial(2000, 0.00015) distribution with mean and variance . See below for a simulation; the Binomial(2000, 0.00015) is very similar to the Poisson(0.3) distribution.
= RV(Poisson(0.3))
X
= X.sim(10000)
x
'impulse')
x.plot(
0.3).plot()
Poisson(
plt.show()
= True) x.tabulate(normalize
Value | Relative Frequency |
---|---|
0 | 0.7388 |
1 | 0.2213 |
2 | 0.0347 |
3 | 0.0049 |
4 | 0.0003 |
Total | 0.9999999999999999 |
0) / 10000, Poisson(0.3).pmf(0) x.count_eq(
## (0.7388, 0.7408182206817179)
2) / 10000, Poisson(0.3).cdf(2) x.count_leq(
## (0.9948, 0.9964005068169105)
0.3).mean() x.mean(), Poisson(
## (0.3066, 0.3)
0.3).var() x.var(), Poisson(
## (0.31499644, 0.3)
2000, 0.00015)).sim(10000).plot('impulse')
RV(Binomial(
0.3).plot()
Poisson(
plt.show()
Example 6.18 Suppose and are independent, each having a Poisson(1) distribution, and let . Also suppose has a Poisson(2) distribution. For example suppose that represents the number of home runs hit by the (home, away) team in a baseball game, so is the total number of home runs hit by either team in the game, and is the number of accidents that occur in a day on a particular stretch of highway
- How could you use a spinner to simulate a value of ? Of ? Are and the same variable?
- Compute . (Hint: what pairs yield ). Compare to .
- Compute . (Hint: what pairs yield ). Compare to .
- Compute . (Hint: what pairs yield ). Compare to .
- Are and the same variable? Do and have the same distribution?
Solution. to Example 6.18
Show/hide solution
How could you use a spinner to simulate a value of ? Of ? Are and the same variable? To generate a value of : Construct a spinner corresponding to Poisson(1) distribution (see Figure 6.5), spin it twice and add the values together. To generate a value of : construct a spinner corresponding to a Poisson(2) distribution and spin it once (see Figure 6.6). and are not the same random variable; they are measuring different things. The sum of two spins of the Poisson(1) spinner does not have to be equal to the result of the spin of the Poisson(2) spinner.
Compute . (Hint: what pairs yield ). Compare to . The only way can be 0 is if both and are 0.
Compute . (Hint: what pairs yield ). Compare to . The only way can be 1 is if or .
Compute . (Hint: what pairs yield ). Compare to . The only way can be 2 is if or or .
Are and the same variable? Do and have the same distribution? We already said that and are not the same random variable. But the above calculations suggest that and do have the same distributions. See the simulation results below.
Figure 6.5: Spinner corresponding to a Poisson(1) distribution.
Figure 6.6: Spinner corresponding to a Poisson(2) distribution.
= RV(Poisson(1) ** 2)
X1, X2 = X1 + X2
X
10000).plot()
X.sim(
2).plot()
Poisson( plt.show()
Poisson aggregation. If and are independent, has a Poisson() distribution, and has a Poisson() distribution, then has a Poisson() distribution.
If has mean and has mean then linearity of expected value implies that has mean . If has variance and has variance then independence of and implies that has variance . What Poisson aggregation says is that if component counts are independent and each has a Poisson distribution, then the total count also has a Poisson distribution.
Here’s one proof involving law of total probability133: We want to show that for
Example 6.19 Suppose and are independent with and . For example, suppose represents the number of goals scored by the (away, home) team in a soccer game.
- How could you use spinners to simulate the conditional distribution of given ?
- Are the random variables and independent?
- Compute .
- Compute .
- Compute for all other possible values of .
- Identify the conditional distribution of given .
- Compute .
Solution. to Example 6.19
Show/hide solution
How could you use spinners to simulate the conditional distribution of given ?
- Spin the Poisson(1) spinner once to generate .
- Spin the Poisson(2) spinner once to generate .
- Compute . If keep and record ; otherwise discard the repetition.
- Repeat many times to simulate many values of given . Summarize the simulated values of and their simulated relative frequencies to approximate the conditional distribution of given .
Are the random variables and independent? No. For example, but .
Compute . The key is to take advantage of the fact that while and are not independent, and are. Write events involving and in terms of equivalent events involving and . For example, the event is the same as the event . Also, remember that by Poisson aggregation has a Poisson(3) distribution.
Compute .
Compute for all other possible values of .
Given , the only possible values of are 0, 1, 2. So we just need to find . We could just use the fact that the probabilities must sum to 1, but here is the long calculation to help you see the pattern.
Identify the conditional distribution of given . The calculations above suggest that the conditional distribution of given is the Binomial distribution with and
Compute . We could use the distribution and the definition: . Also, the conditional distribution of given is Binomial(2, 1/3), which has mean 2(1/3), so .
= RV(Poisson(1) * Poisson(2))
X, Y
= (X | (X + Y == 2)).sim(10000)
x_given_Zeq2
x_given_Zeq2.tabulate()
Value | Frequency |
---|---|
0 | 4478 |
1 | 4450 |
2 | 1072 |
Total | 10000 |
x_given_Zeq2.mean()
## 0.6594
x_given_Zeq2.plot()
2, 1 / 3).plot()
Binomial( plt.show()
Poisson disaggregation (a.k.a., splitting, a.k.a., thinning). If and are independent, has a Poisson() distribution, and has a Poisson() distribution, then the conditional distribution of given is Binomial .
The total count of occurrences can be disaggregated into counts for occurrences of “type ” or occurrences of “type ”. Given occurrences in total, each of the occurrences is classified as type with probability proportional to the mean number of occurrences of type , , and occurrences are classified independently of each other.
6.4.1 Poisson approximation
Where do Poisson distributions come from? We saw in Example 6.17 that the Binomial(2000, 0.00015) distribution is approximately the Poisson(0.3) distribution. This is an example of the “Poisson approximation to the Binomial”. If counts the number of successes in a Binomial situation where the number of trials is large and the probability of success on any trial is small, then has an approximate Poisson distribution with parameter .
Let’s see why. We’ll reparametrize the Binomial() pmf in terms of the mean , and apply some algebra and some approximations. Remember, the pmf is a distribution on values of the count , for , but the probabilities are negligible if is not small.
The above calculation shows that the Binomial(2000, 0.3) pmf is approximately equal to the Poisson(0.3) pmf.
Now we’ll consider a general Binomial situation. Let count the number of successes in Bernoulli() trials, so has a Binomial(,) distribution. Suppose that is “large”, is “small” (so success is “rare”) and is “moderate”. Then has an approximate Poisson distribution with mean . The following states this idea more formally. The limits in the following make precise the notions of “large” (), “small” (), and “moderate” ().
Poisson approximation to Binomial. Consider Bernoulli trials with probability of success on each trial134 equal to . Suppose that while and , where . Then for
The proof relies on the same ideas we used in the Binomial(2000, 0.00015) approximation above. Fix (Since we are letting we can assume that .) Some algebra and rearranging yields
Poisson approximation of Binomial is one way that Poisson distributions arise, but it is far from the only way. Part of the usefulness of Poisson models is that they do not require the strict assumptions of the Binomial situation.
Example 6.20 Recall the matching problem in Example 5.1 with a general : there are rocks that are shuffled and placed uniformly at random in spots with one rock per spot. Let be the number of matches. We have seen:
- The exact distribution of when , via enumerating outcomes in the sample space (Example 5.1).
- The approximate distribution for any , via simulation (Section 2.14)
- for any value of , via linearity of expected value (Example 5.30).
Now we’ll consider the distribution of for general .
- Use simulation to approximate the distribution of for different values of . How does the approximate distribution of change with ?
- Does have a Binomial distribution? Consider: What is a trial? What is success? Is the number of trials fixed? Is the probability of success the same on each trial? Are the trials independent?
- If has an approximate Poisson distribution, what would the parameter have to be? Compare this Poisson distribution with the simulation results; does it seem like a reasonable approximation?
- For a general , approximate for .
- For a general value of , approximate the probability that there is at least one match. How does this depend on ?
Solution. to Example 6.20
Show/hide solution
- Simulation results for are displayed below. Clicking on the link to the Colab notebook will take you to an interactive plot where you can change the value of . We see that unless is really small (5 or less) then the distribution of essentially does not depend on . That’s amazing!
- Each rock is a trial, and success occurs if it is put in the correct spot. There are trials, fixed. The unconditional probability of success the same on each trial, . However, the trials are not strictly independent. For example, if the heaviest rock is placed in the correct spot, the conditional probability that the next heaviest rock is placed in the correct spot is ; if all rocks except for the lightest rock are placed in the correct spots, then the conditional probability that the lightest rock is placed in the correct spot is 1. So does not have a Binomial distribution.
- We have already seen (exactly) for all , so if has an approximate Poisson distribution the parameter has to be 1. Yes, it does seem from the simulation results that the Poisson(1) approximates the distribution of pretty well, for any (unless is really small).
- Just use the Poisson(1) pmf; see the spinner in Figure 6.5. Since , is approximately proportional to : 1 is as likely as 0, 2 is 1/2 as likely as 1, 3 is 1/3 as likely as 2, 4 is 1/4 as likely as 3, and so on.
- Since , the approxiate probability that there is at least one match is , for any (unless is really small). Amazing!
= 10
n = list(range(n)) # list of labels [0, ..., n-1]
labels
# define a function which counts number of matches
def count_matches(x):
= 0
count for i in range(0, n, 1):
if x[i] == labels[i]:
+= 1
count return count
= BoxModel(labels, size = n, replace = False)
P
= RV(P, count_matches)
Y
= Y.sim(10000)
y
y.plot()
1).plot()
Poisson( plt.show()
= True) y.tabulate(normalize
Value | Relative Frequency |
---|---|
0 | 0.3661 |
1 | 0.3731 |
2 | 0.1795 |
3 | 0.0617 |
4 | 0.0157 |
5 | 0.0034 |
6 | 0.0005 |
Total | 0.9999999999999999 |
Poisson models often provide good approximations to Binomial models. More importantly, Poisson models often provide good approximations for “count data” when the restrictive assumptions of Binomial models are not satisfied.
Some advantages for using a Poisson model rather than a Binomial model
- In a Poisson model, the number of trials doesn’t need to be specified; it can be unknown or random (e.g. the number of automobiles on a highway varies from day to day). The number of trials just has to be “large” (though what constitutes large depends on the situation; didn’t have to be very large in the matching problem for the Poisson approximation to kick in.)
- In a Binomial model, the number of trials must be fixed and known.
- In a Poisson model, the probability of success does not need to be the same for all trials, and the probability of success for individual trials does not need to be known or estimated. The only requirement is that the probability of success is “comparably small” for all trials.
- In a Binomial model, the probability of success must be the same for all trials and must be fixed and known.
- Fitting a Poisson model to data only requires data on total counts, so that the average number of successes can be estimated.
- Fitting a Binomial model to data requires results from individual trials so that the probability of success can be estimated. (For example, you would need to know both the total number of automobiles on the road and the number that got into accidents.)
- In a Poisson model, the trials are not required to be strictly independent as long as the trials are “not too dependent”.
- In a Binomial model, the trials must be independent.
Example 6.21 Recall the birthday problem from Example 3.2: in a group of people what is the probability that at least two have the same birthday? (Ignore multiple births and February 29 and assume that the other 365 days are all equally likely.) We will investigate this problem using Poisson approximation. Imagine that we have a trial for each possible pair of people in the group, and let “success” indicate that the pair shares a birthday. Consider both a general and .
- How many trials are there?
- Do the trials have the same probability of success? If so, what is it?
- Are any two trials independent? To answer this questions, suppose that three people in the group are Ki-taek, Chung-sook, and Ki-jung and consider any two of the trials that involve these three people.
- Are any three trials independent? Consider the three trials that involve Ki-taek, Chung-sook, and Ki-jung.
- Let be the number of pairs that share a birthday. Does have a Binomial distribution?
- In what way are the trials “not too dependent”?
- Use simulation to approximate the distribution of . How does the distribution change with ?
- If has an approximate Poisson distribution, what would the parameter have to be? Compare this Poisson distribution with the simulation results; does it seem like a reasonable approximation?
- Approximate the probability that at least two people share the same birthday. Compare to the theoretical values from Example 3.2.
- Using the approximation from the previous part, how large does need to be for the approximate probability to be at least 0.5?
Solution. to Example 6.21
Show/hide solution
- Each pair is a trial so there are trials. If there are pairs; if there are pairs.
- The probability of success on any trial is 1/365. For any pair, the probability that the pair shares a birthday is 1/365. For any two people, there are possible pairs of birthdays ((Jan 1, Jan 1), (Jan 1, Jan 2), etc.), of which there are 365 possibilities in which the two share a birthday ((Jan 1, Jan 1), (Jan 2, Jan 2), etc.), so the probability is .
- Yes, any two trials are independent. Let be the event that Ki-taek and Chung-sook share a birthday, and let be the event that Ki-taek and Ki-jung share a birthday. Then and . The event is the event that all three share a birthday. There are possible triples of birthdays for the three people ((Jan 1 for Ki-taek, Jan 1 for Ki-jung, Jan 2 for Chung-sook), etc) of which there are 365 possibilities in which all three share a birthday (e.g., (Jan 1, Jan 1, Jan 1), etc). Therefore so and are independent.
- No, not every set of three trials is independent. Let be the event that Ki-taek and Chung-sook share a birthday, let be the event that Ki-taek and Ki-jung share a birthday, and let be the event that Chung-sook and Ki-jung share a birthday. Then . The event is the event that all three people share the same birthday, which has probability as in the previous part. Therefore, So these three trials are not independent. Alternatively, if and are both true, then must also be true so . However, there are many sets of three trials that are independent. In particular, any three trials involving six distinct people are independent.
- Since the trials are not independent, does not have a Binomial distribution.
- Any two trials are independent. Many sets of three trials are independent. Many sets of four trials are independent (e.g., any set involving 8 distinct people), etc. So generally, information on multiple events is required to change the conditional probabilities of other events. In this way, the trials are “not too dependent”.
- See the simulation results for below, and click on the link to a Colab notebook with an interactive simulation. We see that the distribution does depend on ; as increases the distribution of places more probability on larger values of .
- There are trials and the probability of success on each trial is 1/365, so Remember, the “number of trials probability of success” formula works regardless of whether the trials are independent (as long as the probability of success is the same for all trials). For , ; for , . Therefore, if has an approximate Poisson distribution, then it is the Poisson distribution with paramater . The Poisson approximation seems to fit the simulation results fairly well.
- The probability that at least two people share the same birthday is . Using the Poisson approximation For the approximate probability is ; the theoretical probability is 0.814. Figure 6.7 plots the theoretical probability and the approximate probability for different values of . The approximation seems to work pretty well.
- The smallest value of for which the approximate probability is at least 0.5 is , in which case the approximate probability is . The theoretical probability for is 0.507.
import itertools
# count_matching_pairs takes as an input a list of birthdays
# returns as output the number of pairs that share a birthday
# Note the 2 in itertools.combinations is for pairs
def count_matching_pairs(outcome):
return sum([1 for i in itertools.combinations(outcome, 2) if len(set(i)) == 1])
# 2 pairs have a match in the following: (0, 1), (2, 3)
3, 3, 4, 4, 6))
count_matching_pairs((
# 6 pairs have a match in the following:
# (0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)
## 2
3, 3, 3, 3, 4)) count_matching_pairs((
## 6
= 35
n
= BoxModel(list(range(365)), size = n, replace = True)
P
= RV(P, count_matching_pairs)
X
10000).plot()
X.sim(
import scipy
= scipy.special.binom(n, 2) / 365
mu
Poisson(mu).plot() plt.show()

Figure 6.7: Probability of at least one birthday match as a function of the number of people in the room, along with the Poisson approximation. For 23 people, the probability of at least one birthday match is 0.507.
Poisson paradigm. Let be a collection of events. Suppose event occurs with marginal probability . Let be the random variable which counts the number of the events in the collection which occur. Suppose
- is “large”,
- are “comparably small”, and
- the events are “not too dependent”,
Then has an approximate Poisson distribution with parameter .
We are leaving the terms “large”, “comparably small”, and “not too dependent” undefined. There are many different versions of Poisson approximations which make these ideas more precise. We only remark that Poisson approximation holds in a wide variety of situations.
The individual event probabilities can be different, but they must be “comparably small”. If one is much greater than the others, then the count random variable is dominated by whether event occurs or not. Also, as long as is available, it is not necessary to know the individual .
Even though is a constant in the above statement of the Poisson paradigm, there are other versions in which the number of events is random and unknown.
Example 6.22 Use Poisson approximation to approximate that probability that at least three people in a group of people share a birthday. How large does need to be for the probability to be greater than 0.5?
Show/hide solution
There are triples of people. We showed in Example 6.21 that the probability that any three people share a birthday is . If is the number of triples that share a birthday, then . The number of trials is large and the probability of success on any trial is small, so we assume has an approximate Poisson distribution. Therefore, the probability that at least three people share a birthday is The smallest for which this probability is greater than 0.5 is . For , the probability is at least 0.9.
The parameter for a Poisson distribution is often denoted . However, we use to denote the parameter of a Poisson distribution, and reserve to denote the rate parameter of a Poisson process (which has mean at time ).↩︎
There are easier proofs, e.g., using moment generating functions.↩︎
When there are trials, the probability of success on each of the trials is . The subscript indicates that this value can change as changes (e.g. 1/10 when , 1/100 when ), so that when is large is small enough to maintain relative “rarity”.↩︎