## A.7 Answer: TW 7 tutorial

1. This is just one box (one observation), but the claim is about the population mean. Some boxes will have more than 45, and some fewer.

2. Jake is correct in one sense: You can't have 0.9 of a match.

But the value is the mean number, and that can be a decimal. Suppose 10 boxes had 49 matches, and 10 boxes had 50 matches... is the mean 49, or is it 50? Neither are correct; the mean is 49.5.

3. Jake is confusing the sample and population mean. The claim is that the population mean is $$45$$. The sample produced a mean of $$44.9$$.

Why should the mean of two different things be the same? It's like expecting your height and your Mum's height to be the same: they are both heights, but of different things. Why should they be the same?

Of course, every sample will produce a different sample mean. This sample may just have an unusually low number of matches.

4. Either (1) The manufacturer is lying; or (2) the manufacturer is not lying, and this sample just happens to have a smaller number of matches: (bad) luck.

5. A CI gives some indication of the variation implied by the sample.

6. The standard error for the mean is $$0.124\div\sqrt{25} = 0.0248$$. So the approximate 95% CI is: $$44.9 \pm (2\times 0.0248)$$, or $$44.9\pm 0.0496$$, or from $$44.85$$ to $$44.95$$.

7. No. A 95% CI may or may not contain the population mean. Of course, the manufacturer may indeed be lying... but we'd need to be cautious about making such a bold claim on just this evidence. Ideally, we would repeat this study a few times or take a larger sample. But it is looking suspicious...

If we had many, many sets of 25 matches boxes, 95% of these sets of 25 would have a mean between 44.85 and 44.95.

8. $$\bar{x}$$ is the mean of the sample, so $$\bar{x} = 44.9$$.

9. $$\mu$$ is the mean of the population; the true mean if you like. $$\mu$$ is claimed to be 45, but the the value of $$\bar{x}$$ will, of course, vary.

1. The two groups are completely different.

2. The parameter of interest is the difference between the population mean lifetimes, say $$\mu_R - \mu_F$$.

3. The 95% CI is the bottom one: from $$223.34$$ to $$346.13$$ days.

4. The best of these is Option (e)... but in practice, we usually think about CIs in terms of Option (d).

5. The CI explanation can be improved by:

2. providing sample summary info.

"The 95% confidence interval for the difference between the populations mean lifetimes of rats on the restricted diet (sample mean: 968.8 days; std dev: 284.6 days) and on the free-eating diet (684.0 days; std dev: 134.1 days) is that rats on a restricted diet live between $$223.34$$ and $$346.13$$ days longer."

6. Since the sample is large, we must have that the two samples are independent (which is reasonable). (The figure is not needed.)

7. The boxplots show the variation in the lifetimes of individual rats. The error bar chart displays the variation that the sample means would be expected to show from sample to sample.

1. See Table A.2.
2. Use a side-by-side barchart, for example, if necessary.
3. Odds of boys maturing late: $$352\div(2\,864-352) = 0.1401$$. Thus boys are 0.1401 times more likely to mature late than not.
4. Odds of girls maturing late: $$336\div(2\,328) = 0.1443$$. Thus girls are 0.1443 times more likely to mature late than not.
5. Hence, to compare boys to girls: $$0.1401\div 0.1443 = 0.971$$.
6. The parameter of interest is the population odds ratio of late maturing, comparing boys to girls.
7. From software: OR is 0.971, and 95% CI is from 0.828 to 1.139.
8. See Table A.3. 1, The difference could be explained by sampling variation, or because there is a real difference...
TABLE A.2: Maturation and gender
Matured late Did not mature late Total
Males 352 2512 2864
Females 336 2328 2664
Total 688 4840 5528
TABLE A.3: Maturation and gender: Numerical summary (Enter percentages to one decimal place. Enter odds and odds ratio to three decimal places)
Percentage maturing late Odds maturing late Sample size
Males 12.3 0.1401 2864
Females 12.6 0.1443 2664
Odds ratio 0.97088

$$n=(2\times 7.145\div 0.5)^2 = 816.8$$, so use guesses from 817 students.

1. Researchers examined the strength of fibre reinforced concrete, by using a study design called an experiment.

In batch 1, a sample of size 30 was used; the sample mean number of blows till the first crack appeared in the test cylinders was 98, and the amount of variation in the number of blows was measured using the standard deviation as 54.
Because the data are a sample, the sample mean will estimate the population mean with some sampling error.

2. A type of study called an experiment compared the handwriting legibility for school children (Ryan et al., 2010) having cerebral palsy when using specialty school furniture with standard school furniture (which acted as a control).

They used a random sample of size 30 from children registered at their facility in Canada.

The sample mean for the difference in legibility was -0.1, and a 95% confidence interval was from $$-0.8$$ to $$0.6$$.

Using the standard equipment, the smallest value recorded for legibility was 19, and the largest was 34, so the range was 15.

1. Observational.
2. Relational.
3. Two completely separate samples are compared.
4. $$-0.774$$ to $$-0.560$$ inches (differences are Dominos less Eagle Boys).
5. The 95% CI for the difference in population means pizza diameters between EB and DOM pizzas from $$0.774$$ to $$0.560$$ inches, larger for EB.
6. Since the sample sizes are large (both 125), we do not require that the populations have normal distributions.
7. The sample sizes are large ($$n=125$$ in each), so we don't need the populations to be normally distributed; we don't need the histogram.
8. Probably yes. Amount of topping on the pizza? Which tastes better? Whether the samples were randomly selected or not?

1. $$H_0$$: No association between income level and opinion of GM foods in the population; $$H_1$$: An association between income level and opinion of GM foods in the population.
2. Odds HIE: $$263/151 = 1.742$$. HIE is 1.74 times more likely to be for GM foods than against.
3. Odds LIE: $$258\div222 = 1.162$$. LIE is 1.16 times more likely to be for GM foods than against.
4. $$\text{OR}(\text{HIE in favour})\div\text{OR}(\text{LIE in favour}) = 1.742/ 1.162 = 1.5$$ ($$1.499$$ in table). The odds of HIE being for GM food 1.5 times the odds that a LIE for GM foods.
5. From the sample, we estimate the OR in the population to be between $$1.145$$ to $$1.961$$. (Loosely, though technically incorrect: the true OR is likely to be between 1.145 and 1.961.) Importantly, this interval does not include 1.