17 Distributions and models

So far, you have learnt to ask a RQ, design a study, describe and summarise the data, understand the decision-making process and to work with probabilities. In this chapter, you will learn about distributions and models to describe the distribution of populations and samples. You will learn to:

  • describe distributions.
  • describe populations using normal distributions.
  • use \(z\)-scores to compute probabilities related to normal distributions.
  • work 'backwards' from probabilities for normal distributions.

17.1 Introduction

In the decision-making process used in statistics, an assumption is made about a parameter of the population. And, of course, many different samples could be drawn from this population, and so the sample statistic can vary.

Remember: Studying a sample leads to the following observations:

  • Each sample is likely to be different.
  • Our sample is just one of countless possible samples from the population.
  • Each sample is likely to produce a different value for the sample statistic.
  • Hence we only observe one of the many possible values for the sample statistic.

Since many values for the sample statistic are possible, the possible values of the sample statistic vary (called sampling variation) and have a distribution (called a sampling distribution).

Based on the assumption about the parameter, we can describe the values of the statistic that we would expect from these all possible samples. Of course, we only study one of these many possible samples.

The expectations about the sample statistic are based on how the statistic (such as a sample mean, or a sample proportion, or a sample odds ratio) is distributed: that is, what values it can take from different samples, and how often.

A model is used to describe this sampling distribution. For example, if I deal 15 cards, the statistic could be 'the proportion of red cards in a hand of 15'. The model would describe how often we would see a proportion of 0/15 red cards, 1/15 red cards, 2/15 red cards... up to 15/15 red cards (Sect. 15.4).

Under certain circumstances, many sample statistics have a similar-shaped distribution: a bell-shaped (or normal) distribution (Sect. 13.5.1). We now study this distribution, as it often is the basis for describing what values the statistic can be expected to take, based on the initial assumption about the population.

17.2 Distributions: an example

Consider the heights of all Australian adult males (the population). Clearly, the height of all Australian adult males is unknown: no-one has ever, or could ever realistically, measure the height of every Australian adult male. The Australian Bureau of Statistics (ABS), however, takes samples of Australians to finds heights and other measurements.

A model could be assumed for the heights of the population of Australian adult males. A model is a theoretical idea that might be a useful description of the heights of Australian adult males in the population. Suppose a model for the heights of Australian adult males is adopted that describes the heights as:

  • having a bell-shaped (normal) distribution,
  • with a mean height of 175cm, and
  • a standard deviation of 7cm.

Then, the distribution of the heights of Australian adult males would look like Fig. 17.1 (left panel). That is, most Australian adult males are between about 168 and 182cm, and very few are taller than 196cm or shorter than 154cm.

A model for the heights of Australian adult males, showing 1, 2 and 3 standard deviations either side of the mean (left); the model for heights of Australian adult males, plus the histogram from one specific sample of size $n = 100$ of Australian adult males (right)

FIGURE 17.1: A model for the heights of Australian adult males, showing 1, 2 and 3 standard deviations either side of the mean (left); the model for heights of Australian adult males, plus the histogram from one specific sample of size \(n = 100\) of Australian adult males (right)

Since we do not know the heights of all Australian men, this model represents an idealised, or assumed, picture of the histogram of the heights of all Australian adult males in the population. If this model is accurate, the distribution of heights in any sample may be shaped a bit like this... but sampling variation will exist, and every sample will be a bit different.

While any one sample will look a bit different than this model, the model captures the general feel of the histogram from many of these samples. The model is like the 'average' of many sample histograms. For example, see the animation below, where many samples of \(n = 100\) men are taken.

The model of heights has approximately a bell-shape: that is, most values are near the average height, but a small number of men are very tall or very short. A bell-shaped distribution is formally called a normal distribution or a normal model. A normal distribution is a way of modelling the population.

A model is a theoretical or ideal concept. In the same way that a model skeleton isn't 100% accurate and certainly not exactly like your skeleton, it suitably approximates reality. None of us probably have a skeleton exactly like the model, but the model is still useful and helpful.

Likewise, no variable has exactly a normal distribution, but the model is still useful and helpful. The model is a theoretical way of describing the distribution in the population; it does not represent any particular sample of data. The model can be thought of as an 'average' of the histograms of the data from many samples.

Indeed, if this model turns out to be poor at describing what appears in samples, the parameters of the model (the values of \(\mu\) and \(\sigma\)) can be adjusted so the model does describe the sample data well. In fact, evidence suggests that the average height of Australians has been increasing (Loesch, Stokes, and Huggins 2000) and so the mean of the model may need to be changed to remain a good model.

A model is like an educated guess of the unknown population, based on sample information. For the heights of Australian males, then, a suitable model may be an approximately normal shape, with a mean height of \(\mu = 175\)cm, and a standard deviation of \(\sigma = 7\) cm.

17.3 About normal distributions

All normal distributions have these properties:

  • Normal distributions are symmetric about the mean.

  • No upper limit or lower limit exists, in theory, for the variable. Of course, the chance that some of these values occur is essentially zero (such as a male taller than 350cm).

  • The 68--95--99.7 rule applies:

    • approximately 68% of values lie within 1 standard deviation of the mean;
    • approximately 95% of values lie within 2 standard deviations of the mean; and
    • approximately 99.7% of values lie within 3 standard deviations of the mean.

These properties are true for all normal distributions, whatever the mean \(\mu\) and whatever the standard deviation \(\sigma\).

17.4 Standardising (\(z\)-scores)

Since the 68--95--99.7 rule (Sect. 13.11) applies for all normal distributions, the percentages only depend on how many standard deviations (\(\sigma\)) a value (\(x\)) is from the mean (\(\mu\)). This information can be used to learn more about how values are distributed.

Example 17.1 (The 68--95--99.7 rule) Suppose heights of Australian adult males have a mean of \(\mu = 175\)cm, and a standard deviation of \(\sigma = 7\)cm, and (approximately) follow a normal distribution. Using this model, what proportion of Australian adult men are taller than 182cm?

From a picture of the situation (Fig. 17.2, left panel), \(175 + 7 = 182\)cm is one standard deviation above the mean. Since 68% of values are within one standard deviation of the mean, 32% are outside that range, smaller or larger. Hence, 16% are taller than one standard deviation above the mean, so the answer is about 16%. (Another 16% are less than one standard deviation below the mean, or less than \(175 - 7 = 168\)cm in height.)

Again, the percentages only depend on how many standard deviations (\(\sigma\)) the value (\(x\)) is from the mean (\(\mu\)), and not the actual values of \(\mu\) and \(\sigma\).

Left: What proportion of Australian adult males are taller than 182cm? Right: What proportion of Australian adult males are shorter than 161cm?

FIGURE 17.2: Left: What proportion of Australian adult males are taller than 182cm? Right: What proportion of Australian adult males are shorter than 161cm?

Example 17.2 (The 68--95--99.7 rule) Consider again the heights of Australian adult males. Using this model, what proportion are shorter than 161cm?

Again, drawing the situation is helpful (Fig. 17.2, right panel). Since \(175 - (2\times 7) = 161\), then 161cm is two standard deviation below the mean. Since 95% of values are within two standard deviation of the mean, 5% are outside that range (half smaller, half larger; see Fig. 17.2, right panel), so that 2.5% are shorter than 161cm. (Another 2.5% are taller than \(175 + 14 = 189\)cm.)

Again, the percentages only depend on how many standard deviations (\(\sigma\)) the value (\(x\)) is from the mean (\(\mu\)). The number of standard deviations that an observation is from the mean is called a \(z\)-score. A \(z\)-score is computed using

\[ z = \frac{ x - \mu}{\sigma}, \] where \(\sigma\) is a measure of the variation in the \(x\)-values. Converting values to \(z\)-scores is called standardising.

Definition 17.1 (z-score) A \(z\)-score measures how many standard deviations a value is from the mean. In symbols:

\[\begin{equation} z = \frac{x - \mu}{\sigma}, \tag{17.1} \end{equation}\] where \(x\) is the value, \(\mu\) is the mean of the distribution, and \(\sigma\) is the standard deviation of the distribution (measuring the variation in the \(x\)-values).

The \(z\)-score is the number of standard deviations the observation is away from the mean, and is also called the standardised value or standard score, and is calculated using Equation (17.1). Note that:

  • \(z\)-scores are negative for observations below the mean.
  • \(z\)-scores are positive for observations above the mean.
  • \(z\)-scores have no units (that is, not measured in kg, or cm, etc.).

Example 17.3 ($z$-scores) In Example 17.1, the \(z\)-score for a height of 182cm is

\[ z = \frac{x-\mu}{\sigma} = \frac{182 - 175}{7} = 1, \] one standard deviation above the mean. In Example 17.2, the \(z\)-score for a height of 161cm is

\[ z = \frac{x-\mu}{\sigma} = \frac{161 - 175}{7} = -2, \] two standard deviations below the mean (a negative \(z\)-score means the value is below the mean).

Example 17.4 (The 68--95--99.7 rule) Consider the model for the heights of Australian adult males: a normal distribution, mean \(\mu = 175\), standard deviation \(\sigma = 7\) (Fig. 17.1).

Using this model:

  • A height of 175cm is zero standard deviations from the mean: \(z = 0\).
  • 168cm is one standard deviation below the mean: \(z = -1\).
  • 182cm is one standard deviation above the mean: \(z = 1\).
  • 161cm and 189cm are two standard deviations from the mean: \(z = -2\) and \(z = 2\) respectively.
  • 154cm and 196cm are three standard deviations from the mean: \(z = -3\) and \(z = 3\) respectively.

17.5 Approximating percentages using the 68--95--99.7 rule

As we have seen above, percentages under normal distributions can be approximated using the 68--95--99.7 rule.

Example 17.5 (Normal distribution areas) Suppose again that heights of Australian adult males have a mean of \(\mu = 175\)cm, and a standard deviation of \(\sigma = 7\)cm, and (approximately) follow a normal distribution (Fig. 17.3).

To find the proportion of men shorter than 160cm, first draw the situation (Fig. 17.4).

The empirical rule and heights of Australian adult males

FIGURE 17.3: The empirical rule and heights of Australian adult males

What proportion of Australian adult males are shorter than 160cm?

FIGURE 17.4: What proportion of Australian adult males are shorter than 160cm?

Proceeding as before, we ask 'How many standard deviations below the mean is 160cm?' Using Equation (17.1) to compute the \(z\)-score, \(160\)cm corresponds to a \(z\)-score of

\[\begin{equation} z = \frac{160 - 175}{7} = -2.14; \tag{17.2} \end{equation}\] that is, \(2.14\) standard deviations below the mean.

What percentage of observations are less than this \(z\)-score? This case is not covered by the 68--95--99.7 rule, though we can use the 68--95--99.7 rule to make some rough estimates.

About 2.5% of observations are less than 2 standard deviations below the mean (Example 17.1); that is, about 2.5% of men are shorter than 161cm. So the percentages males even shorter than 161cm (that is, further into the tail of the distribution), will be less than 2.5%. While we don't know the probability exactly, it will be smaller than 2.5%.

Estimates in this way are crude, but often serviceable. However, better estimates of 'areas under the normal curve' are found using tables compiled for this very purpose. These tables are in Appendix B.2. 'Percentages' under a normal curve are also called 'areas' under the normal curve. The total area under a normal curve is one (or 100%), since it represent all possible values that could be observed: evety height appear somewhere in Fig. 17.3.

We now learn how to use these tables for Example 17.5.

17.6 Exact areas from normal distributions

Areas under normal distributions can be found using online tables, or hard copy tables. The online tables are easier to use, but only the online tables are explained in this online book (see the hard-copy version for how to use the hard-copy tables).

The online tables can be found in Appendix B.2. In the tables, enter the \(z\)-score in the the box z.score: then, the probability of finding a \(z\)-score less than (i.e., to the left of) this value is shown.

The tables always give the area to the left of the \(z\)-score that is looked up.

The hard-copy or online tables gives an answer of \(1.62\)%. This agrees with the approximate answer using the 68--95--99.7 rule: less than \(2.5\)%.

17.7 Computing areas (probabilities)

The general approach to computing probabilities from normal distributions is:

  • Draw a diagram, and mark on 160cm (Fig. 17.4).
  • Shade the required region of interest: 'less than 160cm tall' (Fig. 17.4).
  • Compute the \(z\)-score using Equation (17.1).
  • Use the \(z\) tables in Appendix B.2.
  • Compute the answer.

The number of standard deviations that 160cm is from the mean was computed above (Eq. (17.2)): \(z = -2.14\). That is, 160cm is \(2.14\) standard deviations below the mean, so use \(z = -2.14\) in the tables (remembering that the tables give probability (area) less than \(z = -2.14\); Fig. 17.4).

The probability of finding an Australian man less than 160cm tall is about \(1.6\)%. The 68--95--99.7 rule can be used to give approximate probabilities, as a check that the answer found using tables seems reasonable.

More complicated questions can be asked too, as shown in the next section.

17.8 Examples using \(z\)-scores

Example 17.6 (Normal distributions) Aedo-Ortiz, Olsen, and Kellogg (1997) simulated mechanized forest harvesting systems (Devore and Berk 2007). In their study, they modelled the diameter of specific trees using

  • a normal distribution; with
  • a mean of \(\mu = 8.8\) inches; and
  • a standard deviation of \(\sigma = 2.7\) inches.

Using this model, what is the probability that a randomly-chosen tree has a diameter greater than than 6 inches?

Follow the steps identified earlier:

  • Draw a normal curve, and mark on 6 inches (Fig. 17.5, left panel).
  • Shade the region corresponding to 'greater than 6 inches' (Fig. 17.5, right panel).
  • Compute the \(z\)-score using Eq. (17.1): \(\displaystyle z = (6 - 8.8)/2.7 = -2.8/2.7 = -1.04\) to two decimal places.
  • Use tables: The probability of a tree diameter shorter than 6 inches is \(0.1492\). (The tables always give area less than the value of \(z\) that is looked up.)
  • Compute the answer: Since the total area under the normal distribution is one, the probability of a tree diameter greater than 6 inches is \(1 - 0.1492 = 0.8508\), or about \(85\)%.
What proportion of tree diameters are greater than 6 inches?

FIGURE 17.5: What proportion of tree diameters are greater than 6 inches?

The normal-distribution tables always provide area to the left of the \(z\)-score looked up. Drawing a picture of the situation is important: it helps visualise getting the answer from what the table give us. Remember: The total area under the normal distribution is one (or 100%).

Match the diagram in Fig. 17.6 with the meaning for the tree-diameter model (recall: \(\mu = 8.8\) inches):

  1. Tree diameters greater than 11 inches.
  2. Tree diameters between 6 and 11 inches.
  3. Tree diameters less than 11 inches.
  4. Tree diameters between 3 and 6 inches.

1: matches B; 2: matches C; 3: matches D; 4: matches A.

Match the diagram with the description

FIGURE 17.6: Match the diagram with the description

Example 17.7 (Normal distributions) Using the model for tree diameters in Example 17.6, what is the probability that a tree has a diameter between 6 and 11 inches?

First, draw the situation, and shade 'between 6 and 10 inches' (Fig. 17.6, Diagram C). Then, compute the \(z\)-scores for both tree diameters:

  • For 6 inches: \(\quad z = (6 - 8.8)/2.7 = -1.04\).
  • For 11 inches: \(\quad z = (11 - 8.8)/2.7 = 0.81\).

Table B can then be used to find the area to the left of \(z = -1.04\), and also the area to the left of \(z = 0.81\). However, neither of these provide the area between \(z = -1.04\) and \(z = 0.81\) (Fig. 17.7).

What proportion of tree diameters are between 6 and 11 inches? The two shaded areas are what we find by using the tables with $z = -1.04$ and $z = 0.81$, but neither give us the area we are seeking.

FIGURE 17.7: What proportion of tree diameters are between 6 and 11 inches? The two shaded areas are what we find by using the tables with \(z = -1.04\) and \(z = 0.81\), but neither give us the area we are seeking.

Looking carefully at the areas from the tables and the area sought, that area between the two \(z\)-scores is \(0.7910 - 0.1492 = 0.6418\); see the animation below. The probability that a tree has a diameter between 6 and 11 inches is about \(0.6418\), or about \(64\)%.

17.9 Unstandardising: Working backwards

Using the model for tree diameters in Example 17.6 again, different types of questions can be asked too.

Example 17.8 (Normal distributions backwards) Consider again the trees study (Example 17.6). Identify the diameters of the smallest 3% of trees.

This is a different problem than before; previously, the tree diameter was known, so a \(z\)-score could be computed, and hence a probability (Fig. 17.8, top panel).

However, in Example 17.8, the probability is known, and a tree diameter is sought. That is, working 'backwards' is needed (Fig. 17.8, bottom panel), so the \(z\)-tables need to be used 'backwards' too.

Working with $z$-scores. In the tables, the areas (probabilities) are in the body of the table, and the $z$-scores are in the margins of the table.

FIGURE 17.8: Working with \(z\)-scores. In the tables, the areas (probabilities) are in the body of the table, and the \(z\)-scores are in the margins of the table.

Drawing a rough diagram of the situation again is very helpful (Fig. 17.9). We can only mark the approximate location of the required score, but this is sufficient. Then, tables must be used to determine the necessary \(z\)-score.

Tree diameters: The smallest 3\%. The approximate location of the required $z$-score is drawn.

FIGURE 17.9: Tree diameters: The smallest 3%. The approximate location of the required \(z\)-score is drawn.

As before (Sect. 17.6), online tables, or hard copy tables can be used (and again the online tables are easier to use). Only the online tables are explained in this online book (see the hard-copy version for how to use the hard-copy tables).

The online tables can be found in Appendix B.2. In the tables, enter the area to the left of the required unknown value in the box Area.to.left: the \(z\)-score with this probability to the left is shown.

Using hard copy tables, the closest value in the body of the tables to \(3\)% (or \(0.030\)) is \(0.0301\). (Sometimes, the exact area can be found, but usually we take the value as close as possible.) This corresponds to a \(z\)-score of \(z = -1.88\). Using the online tables (and entering an Area.to.the.left of \(0.0300\)), the \(z\)-score is \(-1.881\) (a slightly more precise answer).

To identify the diameters of the smallest \(3\)% of trees, the \(z\)-score that has an area to the left of \(3\)% (or \(0.030\)) needs to be found (or, at least, as close as possible to \(0.03\)).

The tables always give the area to the left of the \(z\)-score that is looked up.

Using either the hard-copy or online tables, the appropriate \(z\)-value is \(-1.88\) standard deviations below the mean; that is, \(z = -1.88\) (Fig. 17.9). The \(z\)-score can be converted to an observation value \(x\) using the unstandardising formula5:
\[ x = \mu + z\sigma. \] Using this unstandardising formula:
\[\begin{align*} x &= \mu + (z\times\sigma) \\ &= 8.8 + (-1.88 \times 2.7) = 3.724; \end{align*}\] that is, about \(3\)% of trees have diameters less than about \(3.72\) inches.

Definition 17.2 (Unstandardizing formula) When the \(z\)-score is known, the corresponding value of the observation \(x\) is \[\begin{equation} x = \mu + z\sigma. \tag{17.3} \end{equation}\] This is called the unstandardising formula.

Ball bearings labelled as "50mm bearings" actually have diameters that follow a normal distribution with mean 50mm and standard deviation \(0.1\)mm. The smallest \(15\)% of bearings are too small for sale. What size bearings cannot be sold?

The closest area from the tables is \(0.1492\), corresponding to \(z = -1.04\). Using the unstandardising formula, \(x = 50 + (-1.04\times 0.1) = 49.896\).

Bearings less than about \(49.90\) mm in diameter cannot be sold.

Example 17.9 (Normal distributions backwards) Using the model for tree diameters in Example 17.6 again, suppose now the diameters of the largest \(25\)% of trees needs to be identified. What are these diameters?

The tree diameters can be modelled with a normal distribution, with a mean of \(\mu = 8.8\) inches and a standard deviation of \(\sigma = 2.7\) inches. Since an area is given, we need to work 'backwards' (Fig. 17.10, bottom panel), so the \(z\)-tables need to be used 'backwards' too. The largest \(25\)% implies large trees, so diameter is larger than the mean.

Using a diagram is important (Fig. 17.10): the tables work with the area to the left of the value of interest, which is \(75\)%. Using either the hard-copy or online tables, the appropriate \(z\)-value is \(z = 0.674\). Then, the \(z\)-score can be converted to an observation value \(x\) using the unstandardising formula: \[\begin{align*} x &= \mu + (z\times\sigma) \\ &= 8.8 + (0.674 \times 2.7) = 10.621. \end{align*}\] That is, about \(25\)% of trees have diameters larger than about \(10.6\) inches.

Tree diameters: The largest 25\% is the same as the smallest 75\%

FIGURE 17.10: Tree diameters: The largest 25% is the same as the smallest 75%

17.10 Example: methane production

A study of methane produced by animals (Huhtanen, Ramin, and Cabezas-Garcia 2016) modelled the retention time of food in sheep using a normal distribution, with the mean retention time as \(\mu = 42.5\) hours, and the standard deviation of the retention time as \(\sigma = 3.68\) hours. We can draw this normal distribution (Fig. 17.11), and then apply the 68--95--99.7 rule:

  • About 68% of retention times are between 38.83 and 46.18 hrs;
  • About 95% of retention times are between 35.14 and 49.86 hrs;
  • About 99.7% of retention times are between 31.46 and 53.54 hrs.
Retention times of food in sheep

FIGURE 17.11: Retention times of food in sheep

Example 17.10 (Working with the normal distribution) Using this model, what proportion of sheep have a retention time less than 40 hours?

A retention time of 40 hours corresponds to a \(z\)-score of:
\[ z = \frac{40 - 42.5}{3.68} = -0.68. \] This is a negative number, since 40 hours is below the mean. Using the normal distribution tables (that give the area to the left of the \(z\)-score), the area to the left of \(z = -0.68\) is \(0.2483\), or about \(24.8\)% (Fig 17.12, top left panel). About \(24.8\)% of sheep have a retention times less than 40 hours.

Example 17.11 (Working with the normal distribution) What proportion of sheep have a retention time greater than 48 hours (two days)?

A retention time of 48 hours corresponds to a \(z\)-score of \(1.49\). Using the normal distribution tables, the area to the left of this \(z\)-score is \(0.9319\), so the area to the right of this \(z\)-score is \(0.0681\) (Fig 17.12, top left panel).

Example 17.12 (Working with the normal distribution) What proportion of sheep have a retention time between 40 and 48 hours?

Plots for retention times

FIGURE 17.12: Plots for retention times

A retention time of 40 hours corresponds to \(z = -0.68\), and, using the normal distribution tables, the area to the left of \(z = -0.68\) is \(0.2483\) (Fig. 17.12, bottom left panel; hatched area). But this is not the area that we are seeking... From earlier, the area to the left of \(z = 1.49\) is \(0.9319\) (Fig. 17.12, bottom left panel; coloured region). But this is not the area to that we are seeking either...

From the two areas that we know, we can find the area that we are seeking:

  • 48 hours corresponds to \(z = 1.49\). The area to the left of this \(z\)-score is \(0.9319\).
  • 40 hours corresponds to \(z = -0.68\). The area to the left of this \(z\)-score is \(0.2483\).
  • The difference between these two areas is what we are seeking: \(0.9319 - 0.2483 = 0.6836\).

So the proportion is about \(0.684\) (or \(68.4\)%).

Example 17.13 (Working with the normal distribution) Consider the 35% of sheep with the shortest retention times. What are these retention times?

The time we seek must be smaller than the mean if it defines the shortest 35% of retention times. We don't know exactly where to draw the retention time that this corresponds to on the diagram; it's just somewhere to the left of the mean (Fig 17.12, bottom right panel).

This time, we know the area to the left, but we do not know the value (or \(z\)-score). Previously, we knew the retention value (and hence the \(z\)-score), but not the area. This is like a 'backwards problem', and we need to find the \(z\)-score 'backwards' (Sect. 17.9). From the hard copy tables, a \(z\)-score of \(z = -0.39\) has an area to the left of \(0.3483\)... which is as close as we can get. From the online tables, \(z = -0.385\).

We know the \(z\)-score, so we can find the retention value, using the unstandardising formula: \(x = \mu + (z \times \sigma)\). The retention time is \(41.07\) hours.

17.11 Summary

A model is a way of theoretically describing the distribution of some quantitative variable in a population. One common model is a normal model or normal distribution, which is a bell-shaped distribution with a theoretical mean \(\mu\) and a theoretical standard deviation \(\sigma\). Probabilities can be computed from normal distributions using \(z\)-scores.

17.12 Quick revision questions

Consider again the model for tree diameters in Example 17.6 (Aedo-Ortiz, Olsen, and Kellogg 1997): a normal distribution with \(\mu = 8.8\) inches, and \(\sigma = 2.7\) inches.

  1. A tree diameter of 7.8 inches corresponds to a \(z\)-score (to two decimal places) of:
  2. The probability that a tree has a diameter less than 7.8 inches is (as a decimal value):
  3. The probability that a tree has a diameter greater than 7.8 inches is (as a decimal value):
  4. A tree diameter of 8.6 inches corresponds to a \(z\)-score (to two decimal places) of:
  5. The probability that a tree has a diameter less than 8.6 inches is (as a decimal value):
  6. The probability that a tree has a diameter greater than 8.6 inches is (as a decimal value):

17.13 Exercises

Selected answers are available in Sect. D.17.

Exercise 17.1 Are the following statements true or false?

  1. The unstandardising formula can be used to compute probabilities.
  2. About 68% of observations are within two standard deviations of the mean.
  3. Positive \(z\)-scores correspond to values larger than the mean.
  4. A \(z\)-score tells us how many standard deviations a value is away from the mean.
  5. A \(z\)-score larger than 4 is impossible.
  6. A \(z\)-score of zero is located at the mean value.
  7. About 5% of observations are less than two standard deviations below the mean.

Exercise 17.2 In a simulation of methods to coat corn seeds (with fertilizer and crop protection chemicals, etc.), Pasha et al. (2016) modelled the seed diameter as having a normal distribution, with mean 7.5mm and standard deviation of 0.225mm.

  1. What is the probability that a seed has a diameter of more than 8mm?
  2. What is the probability that a seed has a diameter less than 7.1mm?
  3. What is the probability that a seed has a diameter between 7.5 and 8mm?
  4. What is the diameter of the smallest 30% of seeds?
  5. What is the diameter of the largest 90% of the seeds?

Exercise 17.3 Consider again the study by Aedo-Ortiz, Olsen, and Kellogg (1997), who studied the diameter of trees in certain forests. The tree diameters can be modelled as having a normal distribution, with a mean of \(\mu = 8.8\) inches, and a standard deviation of \(\sigma = 2.7\) inches. For these trees:

  1. What is the probability that a tree will have a diameter less than 8 inches?
  2. What is the probability that a tree will have a diameter greater than 9 inches?
  3. What is the probability that a tree will have a diameter between 7 and 10 inches?
  4. The largest 15% of trees have what diameters?
  5. The smallest 25% of trees have what diameters?

Exercise 17.4 In a study (Snowden and Basso 2018) to understand factors influencing preterm births, the gestation length of healthy babies was modelled with a normal distribution, having a mean of 40 weeks, and a standard deviation of 1.64 weeks. Using this model:

  1. What proportion of births are longer than 39 weeks (that is, nine months)?
  2. In Australia, a premature birth is defined as a birth occuring before 37 weeks. What proportion of births are expected to be premature?
  3. According to Health Direct, 'Babies born between 32 and 37 weeks may need care in a special care nursery'. What proportion of healthy births would be expected to be born between 32 and 37 weeks gestation?
  4. How long is the gestation length for the longest 5% of pregnancies?
  5. How long is the gestation length for the shortest 5% of pregnancies?

Exercise 17.5 IQ scores are designed to have a mean of 100 and a standard deviation of 15. Mensa is a society for people with a high IQ; specifically, to people who have 'attained a score within the upper two percent of the general population' (Mensa webpage (https://www.mensa.org/)).

What IQ score is needed to join Mensa?

Exercise 17.6 IQ scores are designed to have a mean of 100 and a standard deviation of 15. Zagorsky (2016) reports that the US Military must "reject all military recruits whose IQ is in the bottom 10% of the population" (Zagorsky 2016, 403).

What IQs scores lead to a rejection from the US military?

Exercise 17.7 IQ scores are designed to have a mean of 100 and a standard deviation of 15. Match the diagram in Fig. 17.13 with the meaning.

  1. IQs greater than 110.
  2. IQs between 90 and 115.

 

  1. IQs less than 110.
  2. IQs greater than 85.
Match the diagram with the description

FIGURE 17.13: Match the diagram with the description

Exercise 17.8 IQ scores are designed to have a mean of 100 and a standard deviation of 15. Match the diagram in Fig. 17.14 with the meaning.

  1. The largest 25% of IQ scores.
  2. The smallest 10% of IQ scores.

 

  1. The largest 70% of IQ scores.
  2. The smallest 60% of IQ scores.
Match the diagram with the description

FIGURE 17.14: Match the diagram with the description

Exercise 17.9 A study of the impact of charging electric vehicles (EVs) on electricity demands (Affonso and Kezunovic 2018) modelled the time at which people began charging their EVs at home. Based on a survey (US Department of Transportation 2011), they modelled the time at which EVs began charging as having a mean of 5:30pm, with a standard deviation of 2.28 hrs. For this model:

  1. What is the probability that an EVs will begin charging after 9pm?
  2. What is the probability that an EVs will begin charging before 5pm?
  3. What is the probability that an EVs will begin charging between 5pm and 6pm?
  4. 30% of the EVs begin charging after what time?
  5. The earliest 15% of charging begins when?

Hint: This question is much easier if you convert times into 'minutes after midnight'.