## A.8 Answer: TW 8 tutorial

### Answers for Sect. 8.2

This is just one box of matches (one

*observation*), but the claim is about the*population mean*. Some boxes will have more than \(45\), and some fewer.Jake is correct in one sense: You can't have \(0.9\) of a match. But the value is the

*mean*number, and that*can*be a decimal. Suppose \(10\) boxes had \(49\) matches, and \(10\) boxes had \(50\) matches... is the mean \(49\), or is it \(50\)? Neither are correct; the mean is \(49.5\).Jake is confusing the

*sample*and*population*mean. The claim is that the*population*mean is \(45\). The sample produced a mean of \(44.9\).Why should the mean of two different things be the same? It's like expecting your height and your Mum's height to be the same: they are both heights, but of different things. Why should they be the same?

Of course, every sample will produce a different sample mean. This sample may just have an unusually low number of matches.

Either (1) The manufacturer is lying; or (2) the manufacturer is

*not*lying, and this sample just happens to have a smaller number of matches: (bad) luck.A CI gives some indication of the variation implied by the sample.

The standard error for the mean is \(0.124\div\sqrt{25} = 0.0248\). So the approximate \(95\)% CI is: \(44.9 \pm (2\times 0.0248)\), or \(44.9\pm 0.0496\), or from \(44.85\) to \(44.95\).

No. A \(95\)% CI may or may not contain the population mean. Of course, the manufacturer

*may*indeed be lying... but we'd need to be cautious about making such a bold claim on just this evidence. Ideally, we would repeat this study a few times or take a larger sample. But it*is*looking suspicious...If we had many, many sets of \(25\) matches boxes, \(95\)% of these sets of \(25\) would have a mean between \(44.85\) and \(44.95\).

\(\bar{x}\) is the mean of the sample, so \(\bar{x} = 44.9\).

\(\mu\) is the mean of the population; the true mean if you like. \(\mu\) is

*claimed*to be 45, but the the value of \(\bar{x}\) will, of course, vary.

### Answers for Sect. 8.3

The two groups are completely different.

The

**parameter**of interest is the*difference between the population mean lifetimes*, say \(\mu_R - \mu_F\).The 95% CI is the

*bottom*one: from \(223.34\) to \(346.13\) days.The best of these is Option (e)... but in practice, we usually think about CIs in terms of Option (d).

The CI explanation can be improved by (i) indicating

*which*diet leads to larger average lifetimes; and (ii) providing sample summary info. Here is a better answer:"The \(95\)% confidence interval for the difference between the populations mean lifetimes of rats on the restricted diet (sample mean: \(968.8\) days; std dev: \(284.6\) days) and on the free-eating diet (\(684.0\) days; std dev: \(134.1\) days) is that rats on a restricted diet live between \(223.34\) and \(346.13\) days longer."

Since the sample is large, we must have that the two samples are independent (which is reasonable). (

**The figure is not needed.**)The boxplots show the variation in the lifetimes of individual rats. The error bar chart displays the variation that the sample means would be expected to show from sample to sample.

### Answers for Sect. 8.4

Some answers embedded.

- See Table A.2.
- Use a side-by-side barchart, for example, if necessary.
- Odds of boys maturing late: \(352 \div (2\,864 - 352) = 0.1401\): boys are \(0.1401\) times more likely to mature late than not.
- Odds of girls maturing late: \(336 \div (2\,328) = 0.1443\): girls are \(0.1443\) times more likely to mature late than not.
- Hence, to compare boys to girls: \(0.1401 \div 0.1443 = 0.971\).
- The
**parameter**of interest is the*population odds ratio*of late maturing, comparing boys to girls. - From software: OR is \(0.971\), and \(95\)% CI is from \(0.828\) to \(1.139\).
- See Table A.3. 1, The difference could be explained by sampling variation, or because of a real difference...

Matured late | Did not mature late | Total | |
---|---|---|---|

Males | 352 | 2512 | 2864 |

Females | 336 | 2328 | 2664 |

Total | 688 | 4840 | 5528 |

Percentage maturing late | Odds maturing late | Sample size | |
---|---|---|---|

Males | 12.3 | 0.1401 | 2864 |

Females | 12.6 | 0.1443 | 2664 |

Odds ratio | 0.97088 |

### Answers for Sect. 8.5

\(n=(2\times 7.145\div 0.5)^2 = 816.8\), so use guesses from \(817\) students.

### Answers for Sect. 8.6.1

Researchers (Nataraja et al. 1999) examined the strength of fibre reinforced concrete, by using a study design called an

*experiment*. In batch 1, a*sample*of size \(30\) was used; the sample mean number of blows till the first crack appeared in the test cylinders was \(98\), and the amount of variation in the number of blows was measured using the*standard*deviation as \(54\). Because the data are a sample, the sample mean will estimate the population mean with some sampling*error*.A type of study called an

*experiment*compared the handwriting legibility for school children (Ryan et al., 2010) having cerebral palsy when using specialist school furniture with standard school furniture (which acted as a*control*). They used a*random*sample of size \(30\) from children registered at their facility in Canada. The*sample*mean for the difference in legibility was \(-0.1\), and a \(95\)%*confidence*interval was from \(-0.8\) to \(0.6\). Using the standard equipment, the smallest value recorded for legibility was \(19\), and the largest was \(34\), so the*range*was \(15\).

### Answers for Sect. 8.6.2

- Observational.
- Relational.
- Two completely separate samples are compared.
- \(-0.774\) to \(-0.560\) inches (differences are Dominos less Eagle Boys).
- The 95% CI for the difference in population means pizza diameters between EB and DOM pizzas from \(0.774\) to \(0.590\) inches, larger for EB.
- Since the sample sizes are large (both \(125\)), we
*do not require that the populations have normal distributions*. - The sample sizes are large (\(n = 125\) in each), so we don't need the populations to be normally distributed;
**we don't need the histogram**. - Probably yes.
Amount of topping on the pizza?
Which
*tastes*better? Whether the samples were randomly selected or not?

### Answers for Sect. 8.6.3

- Odds HIE: \(263/151 = 1.742\). HIE is \(1.74\) times more likely to be for GM foods than against.
- Odds LIE: \(258\div222 = 1.162\). LIE is \(1.16\) times more likely to be for GM foods than against.
- \(\text{OR}(\text{HIE in favour})\div\text{OR}(\text{LIE in favour}) = 1.742/ 1.162 = 1.5\) (\(1.499\) in table). The odds of HIE being for GM food \(1.5\) times the odds that a LIE for GM foods.
- From the sample, we estimate the OR in the population to be between \(1.145\) to \(1.961\). (Loosely, though technically incorrect: the true OR is likely to be between \(1.145\) and \(1.961\).) Importantly, this interval does not include \(1\).