4.3 Using the Normal distribution to calculate probabilities
Another very useful property of the Normal distribution (and any distribution, in fact) is that it can help us calculate probabilities. Considering again the Normal distribution as depicted below, it turns out that the area under the curve is always equal to 1. This is in fact the case no matter what the values of \(\mu\) and \(\sigma\) are, and no matter what type of distribution we are considering (Normal or otherwise). This is because the area under the curve represents the probability and, as we know, \(P(\Omega) = 1\).
Knowing that the total area under the curve is equal to 1, we can also find the area under the curve within certain ranges. This tells us the probability of \(X\) (or \(Z\) in the case of the standard normal distribution) taking a certain range of values.
Let us again consider the continuous random variable \(X\) that denotes the height in cm of university students and is normally distributed such that \(X \sim N(172.38,9.85^2 ).\) We may wish to ask questions like:
- What is the probability a university student's height is 172.38cm or shorter?
- What is the probability a university student's height is 182.23cm or shorter?
- What is the probability a university student's height is 182.23cm or taller?
- What is the probability a university student's height is between 152.68 and 192.08?
Using probability notation, we can write down these questions as:
- \(P(X \leq 172.38)\)
- \(P(X \leq 182.23)\)
- \(P(X \geq 182.23)\)
- \(P(152.68 \leq X \leq 192.08)\)
Now let's see how these probabilities can be represented visually:
Three useful implications:
- If you calculate a value's \(z\)-score and find the equivalent probability under the standard normal distribution, the probability will be exactly the same. For example, recall that for \(x = 172.38\) (the mean), the corresponding \(z\)-score is 0. We then have that \(P(X \leq 172.38) = P(Z \leq 0) = 0.5\).
- We can use the complement rule to help work out probabilities. For example, we can see from above that \(P(X \leq 182.23) = 0.84\). Knowing the total area under the curve is equal to 1, it must therefore be the case that \(P(X \geq 182.23) = 1 - P(X \leq 182.23) = 1 - 0.84 = 0.16\).
- Symmetry is a very useful property we can make use of. For example:
- We know that since the distribution is symmetric, the mean value of 172.38 is also the middle value (median). This means that, by symmetry, \(P(X \leq 172.38) = P(X \geq 172.38) = 0.5\).
- Consider \(P(152.68 \leq X \leq 192.08) = 0.95\), and recall that the range (152.68, 192.08) is actually the mean plus or minus two standard deviations. Since the probability of \(X\) being within this range is 0.95, we know that the probability \(X\) is not within this range is 0.05. This area of 0.05 is represented in Plot D above by the white area in the upper and lower tails. By symmetry, we know that the area in the right tail is equal to the area in the left tail. Therefore, each tail has an area of 0.05 / 2 = 0.025. This means that \(P(X \leq 152.68) = P(X \geq 192.08) = 0.025.\)
We can use statistical software packages to help us calculate probabilities such as the ones shown above, and will be learning how to do so in this week's computer lab.
A technical note on inequality signs for a continuous distribution
For a distribution that is continuous, the probability that \(X\) is exactly equal to a given value \(x\) is equal to zero. This is because the area under the curve would simply be a vertical line, which has area zero. The implication of this is that the inequalities \(\leq\) & \(<\), and \(\geq\) & \(>\), are interchangeable. That is, \(P(X \leq x) = P(X < x)\) and \(P(X\geq x) = P(X> x)\).