11  Normal Distributions

Example 11.1 The pdfs in the plot below represent the distribution of hypothetical test scores in three classes. The test scores in each class follow a Normal distribution. Identify the mean and standard deviation for each class.

Table 11.1: Selected percentiles for a Normal distribution
Percentile SDs away from the mean
0.1% 3.09 SDs below the mean
0.5% 2.58 SDs below the mean
1% 2.33 SDs below the mean
2.5% 1.96 SDs below the mean
10% 1.28 SDs below the mean
15.9% 1 SDs below the mean
25% 0.67 SDs below the mean
30.9% 0.5 SDs below the mean
50% 0 SDs above the mean
69.1% 0.5 SDs above the mean
75% 0.67 SDs above the mean
84.1% 1 SDs above the mean
90% 1.28 SDs above the mean
97.5% 1.96 SDs above the mean
99% 2.33 SDs above the mean
99.5% 2.58 SDs above the mean
99.9% 3.09 SDs above the mean

Figure 11.1: A standard Normal(0, 1) spinner. The same spinner is displayed on both sides, with different features highlighted on the left and right. Only selected rounded values are displayed, but in the idealized model the spinner is infinitely precise so that any real number is a possible outcome. Notice that the values on the axis are not evenly spaced.

Figure 11.2: A standard Normal(0, 1) spinner. The same spinner is displayed on both sides, with different features highlighted on the left and right. Only selected rounded values are displayed, but in the idealized model the spinner is infinitely precise so that any real number is a possible outcome. Notice that the values on the axis are not evenly spaced.

Example 11.2 The wrapper of a package of candy lists a weight of 47.9 grams. Naturally, the weights of individual packages vary somewhat. Suppose package weights have an approximate Normal distribution with a mean of 49.8 grams and a standard deviation of 1.3 grams.

  1. Sketch the distribution of package weights. Carefully label the variable axis. It is helpful to draw two axes: one in the measurement units of the variable, and one in standardized units.




  2. Why wouldn’t the company print the mean weight of 49.8 grams as the weight on the package?




  3. Estimate the probability that a package weighs less than the printed weight of 47.9 grams.




  4. Estimate the probability that a package weighs between 47.9 and 53.0 grams.




  5. Suppose that the company only wants 1% of packages to be underweight. Find the weight that must be printed on the packages.




  6. Find the 25th percentile (a.k.a., first (lower) quartile) of package weights.




  7. Find the 75th percentile (a.k.a., third (upper) quartile) of package weights. How can you use the work you did in the previous part?




N_rep = 10000

x = rnorm(N_rep, mean = 49.8, sd = 1.3)

head(x) |>
  kbl()
x
49.14364
50.99079
49.90919
48.84746
51.00235
50.64904
hist(x,
     freq = FALSE,
     main = "")

sum(x < 47.9) / N_rep
[1] 0.0731
pnorm(47.9, 49.8, 1.3)
[1] 0.07193386
pnorm((47.9 - 49.8) / 1.3)
[1] 0.07193386
sum((x > 47.9) & (x < 53))  / N_rep
[1] 0.9202
pnorm(53, 49.8, 1.3) - pnorm(47.9, 49.8, 1.3)
[1] 0.921149
quantile(x, 0.01)
      1% 
46.84973 
qnorm(0.01, 49.8, 1.3)
[1] 46.77575
qnorm(0.01)
[1] -2.326348
49.8 + 1.3 * qnorm(0.01)
[1] 46.77575

Example 11.3 Daily high temperatures (degrees Fahrenheit) in San Luis Obispo in August follow (approximately) A Normal distribution with a mean of 76.9 degrees F. The temperature exceeds 100 degrees Fahrenheit on about 1.5% of August days.

  1. What is the standard deviation?




  2. Suppose the mean increases by 2 degrees Fahrenheit. On what percentage of August days will the daily high temperature exceed 100 degrees Fahrenheit? (Assume the standard deviation does not change.)




  3. A mean of 78.9 is 1.02 times greater than a mean of 76.9. By what (multiplicative) factor has the percentage of 100-degree days increased? What do you notice?




Example 11.4 In a large class, scores on midterm 1 follow (approximately) a Normal\((\mu_1, \sigma)\) distribution and scores on midterm 2 follow (approximately) a Normal\((\mu_2, \sigma)\) distribution. Note that the SD \(\sigma\) is the same on both exams. The 40th percentile of midterm 1 scores is equal to the 70th percentile of midterm 2 scores. Compute

\[ \frac{\mu_1-\mu_2}{\sigma} \]

(This is one statistical measure of effect size.)






Example 11.5 A bank uses an applicant’s score on some criteria to decide whether or not to approve their loan application. Based on past history, the bank has determined that

When someone applies for a loan the bank does not know whether the applicant will eventually repay the loan. How can the bank use the applicant’s score to decide whether or not to approve the loan?

  1. Draw sketches of these two normal curves on the same axis. 



  2. Suggest a decision rule, based on an applicant’s score, for deciding whether or not to approve the applicant’s loan.




  3. Describe the two kinds of classification errors that could be made in this situation.




  4. Determine the probability that we incorrectly reject the application of someone who would repay. Shade in the corresponding region under the Normal curve, and interpret this probability.




  5. Determine the probability that we incorrectly approve the application of someone who would NOT repay. Shade in the corresponding region under the Normal curve, and interpret this probability.




  6. In which direction — smaller or larger — would you need to change the decision rule’s cutoff value in order to decrease the probability that an applicant who would repay the loan is incorrectly rejected?




  7. Would the probability of the other kind of error — incorrectly approving a loan for an applicant who would not repay it — increase or decrease with this new cutoff value?




  8. Determine the cutoff value needed to decrease the probability that an applicant who would repay the loan is incorrectly rejected to 0.05.




  9. Determine the other error probability with this new cut-off rule. 



  10. Now repeat the two previous parts with the goal of decreasing to 0.05 the probability of incorrectly approving an applicant who would not repay the loan.




  11. If you consider the two kinds of errors to be equally serious, how might you decide which of the three decision rules considered so far is the best?




  12. Which error do you think the bank would consider more serious? In which direction would that move the threshold?