22 Distributions and models

So far, you have learnt to ask a RQ, design a study, describe and summarise the data, and understand the decision-making process. In this chapter, you will learn to:

  • describe populations using normal distributions.
  • use \(z\)-scores to compute probabilities related to normal distributions.
  • work 'backwards' from probabilities for normal distributions.

22.1 Introduction

In the decision-making process (Sect. 20.3), an assumption is made about a parameter of the population. A sample is then taken, and the value of the sample statistic computed. Of course, many different samples could be drawn from this population, so the sample statistic varies from sample to sample.

Based on the assumption about the parameter, the values of the statistic that we would reasonably expect from all these possible samples can be described. The challenge is that only one of these countless possible samples is observed.

Remember: studying a sample leads to the following observations:

  • Every sample is likely to be different.
  • We observe just one of the many possible samples.
  • Every sample is likely to yield a different value for the sample statistic.
  • We observe just one of the many possible values for the statistic.

Since many values for the sample statistic are possible, the possible values of the sample statistic vary (called sampling variation) and have a distribution (called a sampling distribution).

As seen in Chap. 21, sampling distributions are often bell-shaped. More formally, bell-shaped distributions are called normal distributions or normal models.

Many sampling distribution can be described by a normal distribution. That is, the normal model is often used to describe the sampling distribution: how a statistic varies. We now study normal distributions, as they appear in many places in research.

22.2 Normal distributions: examples

In Chap. 21, we saw that the proportion of red spins in \(15\) spins of a roulette wheel could vary; similarly, the mean spin from \(15\) spins could vary (Fig. 22.1). In both cases, these sampling distributions had a rough normal distribution or normal model shape. This was true for larger numbers of spins also (Figs. 21.1 and 21.2).

Sampling distributions for the proportion of red spins (left), and the mean of the numbers after $15$ roulette wheel spins (right)

FIGURE 22.1: Sampling distributions for the proportion of red spins (left), and the mean of the numbers after \(15\) roulette wheel spins (right)

A model is a theoretical or ideal concept. A model skeleton isn't \(100\)% accurate and certainly not exactly like your skeleton; nonetheless, it suitably approximates reality. None of us probably have a skeleton exactly like the model, but the model is still useful and helpful. Likewise, no distribution has exactly a normal shape, but the model is still useful and helpful. The model is a way of describing a theoretical distribution in the population; it does not represent any particular sample of data.

The sampling distributions in Fig. 22.1 are not exactly normal distributions (only \(5\ 000\) sets of \(15\) spins are used), but are very close to a normal distribution, and certainly close enough for most purposes. Many sampling distributions have approximate normal distributions. In addition, many quantitative variables have approximate normal distributions too, such as heights of people.

Example 22.1 (Normal distributions of data) Many quantitative variables have approximate normal distributions too. Figure 22.2 (left panel) shows the diastolic blood pressure of \(398\) Americans (Willems et al. 1997; Schorling et al. 1997). Figure 22.2 (right panel) shows the weight of \(83\) male Leadbeater's possums (J. L. Williams et al. 2022).

Two normal distributions: Diastolic blood pressure of $398$ Americans (left); the weight of $83$ male Leadbeater's possums (right).  The solid lines are the model.

FIGURE 22.2: Two normal distributions: Diastolic blood pressure of \(398\) Americans (left); the weight of \(83\) male Leadbeater's possums (right). The solid lines are the model.

22.3 About normal distributions

Sampling distributions represent theoretical distributions of unknown populations, not the distribution of sample data. When the sampling distribution is a normal distribution, the mean of the distribution is denoted by \(\mu\) and the standard deviation by \(\sigma\). (These values may be guided by sample values; e.g., suggesting a mean Leadbeater's possum weight of \(1000\) g based on Fig. 22.2 (right panel) would be silly.)

Normal distributions are symmetric about the mean, and have a bell shape. Half the values are greater than the mean, and half the values are less than the mean. The total probability represented by a normal distribution is one (or \(100\)%). For example, every America has a diastolic blood pressure and so is represented somewhere in Fig. 22.2 (left panel); every male Leadbeater's possum has a weight and so is represented somewhere in Fig. 22.2 (right panel).

In theory, no upper limits or lower limits exists for the variable modelled using a normal distribution. In practice, this is rarely true, but is usually never presents a problem. Consider the normal distributions in Fig. 22.2, for example. The normal distribution shown for the diastolic blood pressure (left panel) is a useful model for all values of diastolic blood pressure seen in practice. Similarly, the normal distribution shown for the weights of the possums (right panel) is a useful model for all possum weight likely to be seen.

One of the most important properties of normal distributions is summarised in the \(68\)--\(95\)--\(99.7\) rule.

Definition 22.1 (The 68--95--99.7 rule) For any normal distribution:

  • approximately \(68\)% of values lie within \(1\) standard deviation of the mean;
  • approximately \(95\)% of values lie within \(2\) standard deviations of the mean; and
  • approximately \(99.7\)% of values lie within \(3\) standard deviations of the mean.

These properties are true for all normal distributions, whatever the variable, whatever the value of the mean \(\mu\), and whatever the value of the standard deviation \(\sigma\).

Example 22.2 (Heights of females) Suppose heights of Australian adult females have a mean of \(\mu = 162\) cm, and a standard deviation of \(\sigma = 7\) cm, and (approximately) follow a normal distribution (based on the Australian Health Survey).

Using the \(68\)--\(95\)--\(99.7\) rule, \(68\)% of Australian women will be between \(162 - 7 = 155\) cm and \(162 + 7 = 169\) cm tall. Similarly, \(95\)% of Australian women will be between \(162 - (2\times 7) = 148\) cm and \(162 + (2\times 7) = 176\) cm tall.

22.4 Standardising (\(z\)-scores)

Since the \(68\)--\(95\)--\(99.7\) rule (Def. 22.1) applies for all normal distributions, the percentages in the rule only depend on how many standard deviations (\(\sigma\)) a value (\(x\)) is from the mean (\(\mu\)). This information can be used to learn more about how values are distributed.

Example 22.3 (The $68$--$95$--$99.7$ rule) Suppose heights of Australian adult females have a mean of \(\mu = 162\) cm, and a standard deviation of \(\sigma = 7\) cm, and (approximately) follow a normal distribution (Example 22.2).

Using this model, what proportion of Australian adult women are taller than \(169\) cm?

From a picture of the situation (Fig. 22.3, left panel), \(162 + 7 = 169\) cm is one standard deviation above the mean. Since \(68\)% of values are within one standard deviation of the mean, \(32\)% are outside that range, smaller or larger. Hence, \(16\)% are taller than one standard deviation above the mean, so the answer is about \(16\)%. (Another \(16\)% are shorter than one standard deviation below the mean, or less than \(162 - 7 = 155\) cm in height.)

Again, the percentages only depend on how many standard deviations (\(\sigma\)) the value (\(x\)) is from the mean (\(\mu\)), and not the actual values of \(\mu\) and \(\sigma\).

Left: What proportion of Australian adult females are taller than $169$\ cm? Right: What proportion of Australian adult females are shorter than $148$\ cm?

FIGURE 22.3: Left: What proportion of Australian adult females are taller than \(169\) cm? Right: What proportion of Australian adult females are shorter than \(148\) cm?

Example 22.4 (The 68--95--99.7 rule) Consider again the heights of Australian adult females. Using this model, what proportion are shorter than \(148\) cm?

Again, drawing the situation is helpful (Fig. 22.3, right panel). Since \(162 - (2\times 7) = 148\), then \(148\) cm is two standard deviation below the mean. Since \(95\)% of values are within two standard deviation of the mean, \(5\)% are outside that range (half smaller, half larger; see Fig. 22.3, right panel), so that \(2.5\)% are shorter than \(148\) cm. (Another \(2.5\)% are taller than \(162 + 14 = 176\) cm.)

Again, the percentages only depend on how many standard deviations (\(\sigma\)) the value (\(x\)) is from the mean (\(\mu\)). The number of standard deviations that an observation is from the mean is called a \(z\)-score. A \(z\)-score is computed using
\[ z = \frac{ x - \mu}{\sigma}, \] where \(\sigma\) is the standard deviation quantifying the variation in the \(x\)-values. Converting values to \(z\)-scores is called standardising.

Definition 22.2 (z-score) A \(z\)-score measures how many standard deviations a value \(x\) is from the mean. In symbols:
\[\begin{equation} z = \frac{x - \mu}{\sigma}, \tag{22.1} \end{equation}\] where \(\mu\) is the mean of the distribution, and \(\sigma\) is the standard deviation of the distribution (measuring the variation in the \(x\)-values).

The \(z\)-score is the number of standard deviations the observation is away from the mean, and is also called the standardised value or standard score. Note that:

  • \(z\)-scores are negative for observations below the mean.
  • \(z\)-scores are positive for observations above the mean.
  • \(z\)-scores have no units (that is, not measured in kg, or cm, etc.).

Example 22.5 (z-scores) In Example 22.3, the \(z\)-score for a height of \(169\) cm is
\[ z = \frac{x-\mu}{\sigma} = \frac{169 - 162}{7} = 1, \] one standard deviation above the mean. In Example 22.4, the \(z\)-score for a height of \(148\) cm is
\[ z = \frac{x-\mu}{\sigma} = \frac{148 - 162}{7} = -2, \] two standard deviations below the mean.

Example 22.6 (The 68--95--99.7 rule) Consider the model for the heights of Australian adult females: a normal distribution, mean \(\mu = 162\), standard deviation \(\sigma = 7\) (Fig. 22.4). Using this model:

  • A height of \(162\) cm is zero standard deviations from the mean: \(z = 0\).
  • \(155\) cm is one standard deviation below the mean: \(z = -1\).
  • \(169\) cm is one standard deviation above the mean: \(z = 1\).
  • \(148\) cm and \(176\) cm correspond to \(z = -2\) and \(z = 2\) respectively.
  • \(141\) cm and \(183\) cm correspond to \(z = -3\) and \(z = 3\) respectively.
The empirical rule and heights of Australian adult females

FIGURE 22.4: The empirical rule and heights of Australian adult females

22.5 Approximating percentages using the \(68\)--\(95\)--\(99.7\) rule

As seen above, the \(68\)--\(95\)--\(99.7\) rule can be used to approximate percentages under normal distributions. The rule can even be used for values that do not exactly align with \(1\), \(2\) or \(3\) standard deviations from the mean.

Example 22.7 (Normal distribution areas) Suppose again that heights of Australian adult females have a mean of \(\mu = 162\) cm, and a standard deviation of \(\sigma = 7\) cm, and (approximately) follow a normal distribution (Fig. 22.4).

Find the proportion of women shorter than \(145\) cm.

First draw the situation (Fig. 22.5). Proceeding as before, we ask 'How many standard deviations from the mean is \(145\) cm?' Using Equation (22.1), \(145\) cm corresponds to a \(z\)-score of
\[\begin{equation} z = \frac{145 - 162}{7} = -2.4285... \tag{22.2} \end{equation}\] which is about \(2.43\) standard deviations below the mean.

What proportion of Australian adult females are shorter than $150$\ cm?

FIGURE 22.5: What proportion of Australian adult females are shorter than \(150\) cm?

What percentage of observations are less than this \(z\)-score? This case is not covered by the \(68\)--\(95\)--\(99.7\) rule, though the rule can be used to make rough estimates.

About \(2.5\)% of observations are less than \(2\) standard deviations below the mean (Example 22.3); that is, about \(2.5\)% of women are shorter than \(148\) cm. So the percentages females shorter than \(145\) cm (that is, even further into the tail of the distribution), will be smaller than \(2.5\)%. While we don't know the probability exactly, it will be smaller than \(2.5\)%.

Probabilities found this way are crude, but often serviceable. More accurate probabilities of 'percentages under the normal curve' are found using tables compiled for this very purpose (Appendix B.1). 'Percentages' under a normal curve are also called 'areas' under the normal curve. We now learn how to use these tables, using Example 22.7 (Sect. 22.5).

22.6 Exact areas from normal distributions

Areas under normal distributions can be found using online tables, or hard copy tables. The online tables are easier to use, but only the online tables are explained in this online book (see the hard-copy version for the hard-copy tables, and instruction for using use the hard-copy tables). The tables (Appendix B.1 work with \(z\)-scores to two decimal places, so consider the \(z\)-score from Sect. 22.5 as \(z = -2.43\).

The online tables can be found in Appendix B.1. In the tables, enter the \(z\)-score in the the box z.score: then, the probability of finding a \(z\)-score less than (i.e., to the left of) this value is shown. The tables give the area as \(0.0075\).

The tables always give the area to the left of the \(z\)-score that is looked up.

The tables always give the area to the left of the \(z\)-score that is looked up.

Either the hard-copy or online tables gives an answer of \(0.75\)%. This is consistent with the rough answer using the \(68\)--\(95\)--\(99.7\) rule: a value less than \(2.5\)%.

The general approach to computing probabilities from normal distributions is:

  • Draw a diagram, and mark on the value(s) of interest.
  • Shade the required region of interest.
  • Compute the \(z\)-score using Equation (22.1).
  • Use the tables in Appendix B.1 to compute corresponding areas (percentages).
  • Deduce the answer.

Using this approach, more complicated questions can be asked too, as shown in the next section.

22.7 Examples using \(z\)-scores

Example 22.8 (Normal distributions) Aedo-Ortiz, Olsen, and Kellogg (1997) simulated mechanized forest harvesting systems (Devore and Berk 2007). The diameters of a specific type of trees were modelled using

  • a normal distribution; with
  • a mean of \(\mu = 8.8\) inches; and
  • a standard deviation of \(\sigma = 2.7\) inches.

Using this model, what is the probability that a randomly-chosen tree has a diameter greater than than \(5\) inches?

Following the steps identified earlier:

  • Draw a normal curve, and mark on \(5\) inches (Fig. 22.6, left panel).
  • Shade the region 'greater than \(5\) inches' (Fig. 22.6, centre panel).
  • Compute the \(z\)-score using Eq. (22.1): \(\displaystyle z = (5 - 8.8)/2.7 = -1.41\) to two decimal places.
  • Use tables: The probability of a tree diameter shorter than \(5\) inches is \(0.0793\). (The tables always give area less than the value of \(z\) that is looked up.)
  • Deduce the answer (Fig. 22.6, right panel): since the total area under the normal distribution is one, the probability of a tree diameter greater than \(5\) inches is \(1 - 0.0793 = 0.9207\), or about \(92\)%.
What proportion of tree diameters are greater than 6 inches?

FIGURE 22.6: What proportion of tree diameters are greater than 6 inches?

The normal-distribution tables always provide area to the left of the \(z\)-score that is looked up. Drawing a picture of the situation is important: it helps visualise getting the answer from what the tables provide. Remember: the total area under the normal distribution is one (or \(100\)%).

Match the diagram in Fig. 22.7 with the meaning for the tree-diameter model (recall: \(\mu = 8.8\) inches):

  1. Tree diameters greater than \(11\) inches.
  2. Tree diameters between \(5\) and \(11\) inches.
  3. Tree diameters less than \(11\) inches.
  4. Tree diameters between \(3\) and \(5\) inches.

1: matches B; 2: matches C; 3: matches D; 4: matches A.

Match the diagram with the description

FIGURE 22.7: Match the diagram with the description

Example 22.9 (Normal distributions) Using the model for tree diameters in Example 22.8, what is the probability that a tree has a diameter between \(5\) and \(11\) inches?

First, draw the situation, and shade 'between \(5\) and \(10\) inches' (Fig. 22.7, Diagram C). Then, compute the \(z\)-scores for both tree diameters:

  • For \(5\) inches: \(\quad z = (5 - 8.8)/2.7 = -1.41\) (below the mean).
  • For \(11\) inches: \(\quad z = (11 - 8.8)/2.7 = 0.81\) (above the mean).

Table B can then be used to find the area to the left of \(z = -1.41\), which is \(0.0793\). Table B can also be used to find the area to the left of \(z = 0.81\), which is \(0.791\). However, neither of these provide the area between \(z = -1.41\) and \(z = 0.81\) (Fig. 22.8).

What proportion of tree diameters are between $5$ and $11$ inches? The two shaded areas are what we find using the tables with $z = -1.41$ and $z = 0.81$; neither give us the area we seek.

FIGURE 22.8: What proportion of tree diameters are between \(5\) and \(11\) inches? The two shaded areas are what we find using the tables with \(z = -1.41\) and \(z = 0.81\); neither give us the area we seek.

Looking carefully at the areas from the tables and the area sought, that area between the two \(z\)-scores is \(0.7910 - 0.0793 = 0.7117\); see the animation below. The probability that a tree has a diameter between \(5\) and \(11\) inches is about \(0.7117\), or about \(71\)%.

22.8 Unstandardising: working backwards

Using the model for tree diameters in Example 22.8 again, different types of questions can be asked too.

Example 22.10 (Normal distributions backwards) Consider again the trees study (Example 22.8). Identify the diameters of the smallest \(3\)% of trees.

This is a different type of problem than before; previously, the tree diameter was known, so a \(z\)-score could be computed, and hence a probability (Fig. 22.9). However, in Example 22.10, the probability is known, and a tree diameter is sought. That is, working 'backwards' is necessary (Fig. 22.9), so the \(z\)-tables need to be used 'backwards' too.

Working with $z$-scores. In the tables, the areas (probabilities) are in the body of the table, and the $z$-scores are in the margins of the table.

FIGURE 22.9: Working with \(z\)-scores. In the tables, the areas (probabilities) are in the body of the table, and the \(z\)-scores are in the margins of the table.

Drawing a rough diagram of the situation again is very helpful (Fig. 22.10). We can only mark the approximate location of the required score, but this is sufficient. Then, tables must be used to determine the necessary \(z\)-score.

Tree diameters: The smallest $3$\%. The approximate location of the required $z$-score is drawn.

FIGURE 22.10: Tree diameters: The smallest \(3\)%. The approximate location of the required \(z\)-score is drawn.

As before (Sect. 22.6), online tables or hard copy tables can be used (and again the online tables are easier to use). Only the online tables are explained in this online book (see the hard-copy version for the hard-copy tables, and instructions for their use).

The online tables can be found in Appendix B.1. In the tables, enter the area to the left of the required unknown value in the box Area.to.left: the \(z\)-score with this probability to the left is shown.

Using online tables, \(z\)-score of \(z = -1.881\). (The hard-copy tables are less precise, and give \(z = -1.88\).)

The tables always give the area to the left of the \(z\)-score that is looked up.

Using either the hard-copy or online tables, the appropriate \(z\)-value is about \(-1.88\) standard deviations below the mean; that is, \(z = -1.88\) (Fig. 22.10). The \(z\)-score can be converted to an observation value \(x\) using the unstandardising formula5:
\[ x = \mu + z\sigma. \] Using this unstandardising formula:
\[\begin{align*} x &= \mu + (z\times\sigma) \\ &= 8.8 + (-1.88 \times 2.7) = 3.724; \end{align*}\] that is, about \(3\)% of trees have diameters less than about \(3.72\) inches.

Definition 22.3 (Unstandardizing formula) When the \(z\)-score is known, the corresponding value of the observation \(x\) is
\[\begin{equation} x = \mu + z\sigma. \tag{22.3} \end{equation}\] This is called the unstandardising formula.

Example 22.11 (Normal distributions backwards) Using the model for tree diameters in Example 22.8 again, suppose now the diameters of the largest \(25\)% of trees needs to be identified. What are these diameters?

The tree diameters can be modelled with a normal distribution, with a mean of \(\mu = 8.8\) inches and a standard deviation of \(\sigma = 2.7\) inches. Since an area is given, we need to work 'backwards' (Fig. 22.11), so the \(z\)-tables need to be used 'backwards' too. The largest \(25\)% implies large trees, so diameter is larger than the mean.

Using a diagram is important (Fig. 22.11): the tables work with the area to the left of the value of interest, which is \(75\)%. Using either the hard-copy or online tables, the appropriate \(z\)-value is \(z = 0.674\). Then, the \(z\)-score can be converted to an observation value \(x\) using the unstandardising formula:
\[\begin{align*} x &= \mu + (z\times\sigma) \\ &= 8.8 + (0.674 \times 2.7) = 10.621. \end{align*}\] That is, about \(25\)% of trees have diameters larger than about \(10.6\) inches.

Tree diameters: The largest $25$\% is the same as the smallest $75$\%

FIGURE 22.11: Tree diameters: The largest \(25\)% is the same as the smallest \(75\)%

22.9 Example: methane production

A study of methane produced by animals (Huhtanen, Ramin, and Cabezas-Garcia 2016) modelled the retention time of food in sheep using a normal distribution, with the mean retention time as \(\mu = 42.5\) hrs, and the standard deviation of the retention time as \(\sigma = 3.68\) hrs. We can draw this normal distribution (Fig. 22.12), and then apply the \(68\)--\(95\)--\(99.7\) rule:

  • about \(68\)% of retention times are between \(38.83\) and \(46.18\) hrs;
  • about \(95\)% of retention times are between \(35.14\) and \(49.86\) hrs;
  • about \(99.7\)% of retention times are between \(31.46\) and \(53.54\) hrs.
Retention times of food in sheep

FIGURE 22.12: Retention times of food in sheep

Example 22.12 (Working with the normal distribution) Using this model, what proportion of sheep have a retention time less than \(40\) hrs?

A retention time of \(40\) hrs corresponds to a \(z\)-score of (Fig. 22.13, top left panel):
\[ z = \frac{40 - 42.5}{3.68} = -0.68. \] This is a negative number, since \(40\) hrs is below the mean. Using the normal distribution tables (that give the area to the left of the \(z\)-score), the area to the left of \(z = -0.68\) is \(0.2483\), or about \(24.8\)%. About \(24.8\)% of sheep have a retention times less than \(40\) hrs.

Example 22.13 (Working with the normal distribution) What proportion of sheep have a retention time greater than \(48\) hrs (two days)?

A retention time of \(48\) hrs corresponds to a \(z\)-score of \(1.49\). Using the normal distribution tables, the area to the left of this \(z\)-score is \(0.9319\), so the area to the right of this \(z\)-score is \(0.0681\) (Fig. 22.13, top right panel).

Example 22.14 (Working with the normal distribution) What proportion of sheep have a retention time between \(40\) and \(48\) hrs?

Plots for retention times

FIGURE 22.13: Plots for retention times

A retention time of \(40\) hrs corresponds to \(z = -0.68\), and, using the normal distribution tables, the area to the left of \(z = -0.68\) is \(0.2483\) (Fig. 22.13, bottom left panel; hatched area). But this is not the area that we seek... From earlier, the area to the left of \(z = 1.49\) is \(0.9319\) (Fig. 22.13, bottom left panel; coloured region). But this is not the area we seek either... From the two areas that we know, we can find the area that we seek:

  • \(48\) hrs corresponds to \(z = 1.49\); the area to the left of this \(z\)-score is \(0.9319\).
  • \(40\) hrs corresponds to \(z = -0.68\); the area to the left of this \(z\)-score is \(0.2483\).
  • The difference between these two areas is sought: \(0.9319 - 0.2483 = 0.6836\).

So the proportion is about \(0.684\) (or \(68.4\)%).

Example 22.15 (Working with the normal distribution) Consider the \(35\)% of sheep with the shortest retention times. What are these retention times?

The time we seek must be smaller than the mean if it defines the shortest \(35\)% of retention times. We don't know exactly where to draw the retention time that this corresponds to on the diagram; it's just somewhere to the left of the mean (Fig. 22.13, bottom right panel).

This time, we know the area to the left, but we do not know the value (or \(z\)-score). This a 'backwards problem', and we need to find the \(z\)-score 'backwards' (Sect. 22.8). From the hard copy tables, a \(z\)-score of \(z = -0.39\) has an area to the left of \(0.3483\)... which is as close as we can get. (The online tables are more precise: \(z = -0.385\).)

We know the \(z\)-score, so the retention value is found using the unstandardising formula: \(x = \mu + (z \times \sigma)\). The retention time is about \(41.1\) hrs.

22.10 Chapter summary

A model is a way of theoretically describing the distribution of some quantitative variable in a population. One common model is a normal model or normal distribution, which is a bell-shaped distribution with a theoretical mean \(\mu\) and a theoretical standard deviation \(\sigma\). Probabilities can be computed from normal distributions using \(z\)-scores.

22.11 Quick revision questions

Consider again the model for tree diameters in Example 22.8 (Aedo-Ortiz, Olsen, and Kellogg 1997): a normal distribution with \(\mu = 8.8\) inches, and \(\sigma = 2.7\) inches.

  1. A tree diameter of \(7.7\) inches corresponds to a \(z\)-score (to two decimal places) of:
  2. The probability that a tree has a diameter less than \(7.7\) inches is (as a decimal value):
  3. The probability that a tree has a diameter greater than \(7.7\) inches is (as a decimal value):
  4. A tree diameter of \(9.8\) inches corresponds to a \(z\)-score (to two decimal places) of:
  5. The probability that a tree has a diameter less than \(9.8\) inches is (as a decimal value):
  6. The probability that a tree has a diameter greater than \(9.8\) inches is (as a decimal value):

22.12 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 22.1 Are the following statements true or false?

  1. The unstandardising formula can be used to compute probabilities.
  2. About \(68\)% of observations are within two standard deviations of the mean.
  3. Positive \(z\)-scores correspond to values larger than the mean.
  4. A \(z\)-score tells us how many standard deviations a value is away from the mean.

Exercise 22.2 Are the following statements true or false?

  1. A \(z\)-score larger than \(4\) is impossible.
  2. A \(z\)-score of zero is located at the mean value.
  3. About 5% of observations are less than two standard deviations below the mean.
  4. A \(z\)-score of zero means a calculation error has been made.

Exercise 22.3 IQ scores are designed to have a mean of \(100\) and a standard deviation of \(15\). Match the diagram in Fig. 22.14 with the meaning.

  1. IQs greater than \(110\).
  2. IQs between \(90\) and \(115\).

 

  1. IQs less than \(110\).
  2. IQs greater than \(85\).
Match the diagram with the description

FIGURE 22.14: Match the diagram with the description

Exercise 22.4 IQ scores are designed to have a mean of \(100\) and a standard deviation of \(15\). Match the diagram in Fig. 22.15 with the meaning.

  1. The largest \(25\)% of IQ scores.
  2. The smallest \(10\)% of IQ scores.

 

  1. The largest \(70\)% of IQ scores.
  2. The smallest \(60\)% of IQ scores.
Match the diagram with the description

FIGURE 22.15: Match the diagram with the description

Exercise 22.5 Consider again the study by Aedo-Ortiz, Olsen, and Kellogg (1997), who studied the diameter of trees in certain forests. The tree diameters can be modelled as having a normal distribution, with a mean of \(\mu = 8.8\) inches, and a standard deviation of \(\sigma = 2.7\) inches. For these trees:

  1. What is the probability that a tree will have a diameter less than \(8\) inches?
  2. What is the probability that a tree will have a diameter greater than \(9\) inches?
  3. What is the probability that a tree will have a diameter between \(7\) and \(10\) inches?
  4. The largest \(15\)% of trees have what diameters?
  5. The smallest \(25\)% of trees have what diameters?

Exercise 22.6 In a simulation of methods to coat corn seeds (with fertilizer and crop protection chemicals, etc.), Pasha et al. (2016) modelled the seed diameter as having a normal distribution, with mean \(7.5\) mm and standard deviation of \(0.225\) mm.

  1. What is the probability that a seed has a diameter of more than \(8\) mm?
  2. What is the probability that a seed has a diameter less than \(7.1\) mm?
  3. What is the probability that a seed has a diameter between \(7.5\) and \(8\) mm?
  4. What is the diameter of the smallest \(30\)% of seeds?
  5. What is the diameter of the largest \(90\)% of the seeds?

Exercise 22.7 In a study to understand factors influencing preterm births (Snowden and Basso 2018), the gestation length of healthy babies was modelled with a normal distribution, having a mean of \(40\) weeks, and a standard deviation of \(1.64\) weeks. Using this model:

  1. What proportion of births are longer than \(39\) weeks (that is, nine months)?
  2. In Australia, a premature birth is defined as a birth occuring before \(37\) weeks. What proportion of births are expected to be premature?
  3. According to Health Direct, 'Babies born between \(32\) and \(37\) weeks may need care in a special care nursery'. What proportion of healthy births would be expected to be born between \(32\) and \(37\) weeks gestation?
  4. How long is the gestation length for the longest \(5\)% of pregnancies?
  5. How long is the gestation length for the shortest \(5\)% of pregnancies?

Exercise 22.8 IQ scores are designed to have a mean of \(100\) and a standard deviation of \(15\). Mensa is a society for people with a high IQ; specifically, to people who have 'attained a score within the upper two percent of the general population' (Mensa webpage (https://www.mensa.org/)). What IQ score is needed to join Mensa?

Exercise 22.9 IQ scores are designed to have a mean of \(100\) and a standard deviation of \(15\). Zagorsky (2016) reports that the US Military must "reject all military recruits whose IQ is in the bottom \(10\)% of the population" (Zagorsky 2016, 403). What IQs scores lead to a rejection from the US military?

Exercise 22.10 A study of the impact of charging electric vehicles (EVs) on electricity demands (Affonso and Kezunovic 2018) modelled the time at which people began charging their EVs at home. Based on a survey (US Department of Transportation 2011), they modelled the time at which EVs began charging as having a mean of \(5\):\(30\)pm, with a standard deviation of \(2.28\) hrs. For this model:

  1. What is the probability that an EVs will begin charging after \(9\)pm?
  2. What is the probability that an EVs will begin charging before \(5\)pm?
  3. What is the probability that an EVs will begin charging between \(5\)pm and \(6\)pm?
  4. \(30\)% of the EVs begin charging after what time?
  5. The earliest \(15\)% of charging begins when?

Hint: This question is much easier if you convert times into 'minutes after midnight'.