## A.7 Answers to Teaching Week 7 tutorial

### Answers to Sect. 7.2

This is just one box (one

*observation*), but the claim is about the*population mean*. Some boxes will have more than 45, and some fewer.Jake is correct in one sense: You can't have 0.9 of a match.

But the value is the

*mean*number, and that*can*be a decimal. Suppose 10 boxes had 49 matches, and 10 boxes had 50 matches... is the mean 49, or is it 50? Neither are correct; the mean is 49.5.

Jake is confusing the

*sample*and*population*mean. The claim is that the*population*mean is \(45\). The sample produced a mean of \(44.9\).Why should the mean of two different things be the same? It's like expecting your height and your Mum's height to be the same: they are both heights, but of different things. Why should they be the same?

Of course, every sample will produce a different sample mean. This sample may just have an unusually low number of matches.

Either (1) The manufacturer is lying; or (2) the manufacturer is

*not*lying, and this sample just happens to have a smaller number of matches: (bad) luck.A CI gives some indication of the variation implied by the sample.

The standard error for the mean is \(0.124\div\sqrt{25} = 0.0248\). So the approximate 95% CI is: \(44.9 \pm (2\times 0.0248)\), or \(44.9\pm 0.0496\), or from \(44.85\) to \(44.95\).

No. A 95% CI may or may not contain the population mean. Of course, the manufacturer

*may*indeed be lying... but we'd need to be cautious about making such a bold claim on just this evidence. Ideally, we would repeat this study a few times or take a larger sample. But it*is*looking suspicious...If we had many, many sets of 25 matches boxes, 95% of these sets of 25 would have a mean between 44.85 and 44.95.

\(\bar{x}\) is the mean of the sample, so \(\bar{x} = 44.9\).

\(\mu\) is the mean of the population; the true mean if you like. \(\mu\) is

*claimed*to be 45, but the the value of \(\bar{x}\) will, of course, vary.

### Answers to Sect. 7.3

\(\mu\) is the population mean diameter size of

*all*EB pizzas; \(\bar{x}\) is the mean diameter of the pizzas in the sample.\(\bar{x} = 11.486\) inches; It's not sensible to quote the diameter to \(0.001\) of a cm; what is sensible?

We don't know the value of \(\mu\), and we never will. Our best

*estimate*is the value of \(\bar{x}\).\(s=0.24658\) inches. It's not sensible to quote the diameter to \(0.001\) of a cm though.

\(\sigma\) is the standard deviation of the population. We don't know the value of \(\sigma\), and we never will.

\(\displaystyle\text{s.e.}(\bar{x}) = s/\sqrt{n} = 0.24658/\sqrt{125} = 0.02205\).

The first measures the variation in the diameters of individual pizzas; the second measures the precision of the sample mean when used to estimate the population mean.

Almost certainly not the same. Probably close to \(\bar{x}=11.486\) inches. More precisely, probably within three standard errors (\(3\times 0.022\)) of \(\bar{x}\).

Normal; mean \(\mu\); std. dev is the standard error of 0.02205.

The approximate 95% CI is \(11.486\pm (2\times 0.02205)\) or \(11.486\pm0.044\), which is from 11.44 to 11.53 inches.

Based on the sample, a 95% confidence interval for the population mean for the pizza diameter is between \(11.44\) and \(11.53\) inches.

\(n>25\)

**or**\(n\le 25\) and population has normal distribution.We do not need to

**assume**that \(n>25\) because we know that it is. (We do*not*require that the sample or the population has a normal distribution. We require that the*sample means*have an approximate normal distribution, which they will if \(n>25\).) So the CI is statistically valid.Population mean diameter probably not 12 inches based on the CI.

Compute: \[n=\left(\frac{2\times 0.24658}{0.04}\right)^2 = 152.004\] at least, so we would need 153 pizzas.

### Answers to Sect. 7.4

**Descriptive**: Every subject is treated the same way; we are not comparing two groups that have been treated differently.- \(\mu_d\) is the mean difference in the target population; \(\bar{d}\) is the mean difference in this sample.
- Each measurement is measured after and before on the same subject.
- \(32\); \(12\); \(24\); \(30\); \(8\); \(14\); \(14\); \(28\); \(38\); \(49\).
(Differences in the other direction are also acceptable;
it just changes the signs of these differences snd so on.
**Importantly, the direction should be stated somewhere.**) - It makes more sense to define directions this way,
so that the difference is the
*increase*in 2MWT. - \(\bar{d} = 24.9\); \(s_d=13.03372\)
- \(\text{s.e.}(\bar{d}) = s_d/\sqrt{n} = 13.03372/\sqrt{10} = 4.121623\). This is the standard deviation of the sample mean difference, a measurement of how precisely the sample mean difference measures the population mean difference.
- Almost impossible. Sample means vary every time we take a sample around the true mean difference, with a normal distribution with standard error \(4.12\). Since we don't know \(\mu\), the best we can say is that the sample mean will vary about our best guess of the population mean; in other words, the sample means vary around \(24.9\) with a standard deviation of about \(4.12\).
- Normal; mean \(\mu\), std deviation is the standard error of 4.121.
- \(24.9\pm (2\times 4.121623)\), or \(24.9\pm 8.243246\), or from \(16.65675\) to \(33.143245\)m.
- We are 95% confident that the population 2MWT
*increases*by a mean amount between \(16.7\) and \(33.1\)m. - Either (or both) of these must be true:
- the population has a normal distribution, and/or
- the sample size is large enough so that the sample means have a normal distribution, so about larger than 25.

- Since \(n<25\), we need to assume the population of differences has a normal distribution. A stem-and-leaf plot suggests this is not unreasonable, so the sample means quite possibly have an approx. normal distribution:
- (Recall we haven't done hypothesis testing in this context yet!) Looks pretty likely that the 2MWT distances are higher after receiving the implant.
- \(n=(2\times 13.03372\div 5)^2 = 27.18\), so need data from at least \(28\) amputees.

### A.7.1 Answers to Sect. 7.6

**1.**\(\bar{x}=16.02\)m.

**2.**\(s=7.145\)m; \(\text{s.e.}(\bar{x}) = s/\sqrt{n} = 7.145/\sqrt{44} = 1.077\)m. The first is a measure of the variation in the original data; the second is a measure of the precision of the sample mean when estimating the population mean.

**3.**The CI is from 13.85 to 18.19m.

**4.**95% CI for population mean guess: 13.85 to 18.19m.

**5.**The population of differences has a normal distribution, and/or \(n>30\) or so.

**6.**Since \(n>30\), all OK if the histogram isn't severely skewed. Probably OK.

**7.**Not really; the CI doesn't contain the true width. But was this just due to the metric units... or perhaps students are just very poor at estimating widths in general! In fact, the Professor also had the students estimate the width of the hall in imperial units also, as a comparison.

### A.7.2 Answers to Sect. 7.7

### A.7.3 Answers to Sect. 7.8

Researchers (Nataraja et al. 1999) examined the strength of fibre reinforced concrete, by using a study design called an

*experiment*.In batch 1, a

*sample*of size 30 was used; the sample mean number of blows till the first crack appeared in the test cylinders was 98, and the amount of variation in the number of blows was measured using the*standard*deviation as 54.

Because the data are a sample, the sample mean will estimate the population mean with some sampling*error*.A type of study called an

*experiment*compared the handwriting legibility for school children (Ryan et al., 2010) having cerebral palsy when using specialty school furniture with standard school furniture (which acted as a*control*).They used a

*random*sample of size 30 from children registered at their facility in Canada.The

*sample*mean for the difference in legibility was -0.1, and a 95%*confidence*interval was from \(-0.8\) to \(0.6\).Using the standard equipment, the smallest value recorded for legibility was 19, and the largest was 34, so the

*range*was 15.

### A.7.4 Answers to Sect. 7.9

**1.**Because each method is used in each sea state.

**3.**\(\bar{d}=0.06167\); \(s_d = 0.2901\). The mean difference is positive: Method 1 measurements slightly higher (on average) than Method 2.

**4.**\(\text{s.e.}({\bar{d}})= s_d/\sqrt{n} = 0.2901/\sqrt{18} = 0.0684\). \(\text{s.e.}({\bar{d}})\) measures the precision with which the sample mean difference estimates the population mean difference.

**5.**\(0.06167\pm(2\times 0.0684)\), which is \(0.06167 \pm 0.137\), or from \(-0.075\) to \(0.199\) Newton--metres.

**6.**Since \(n<30\), we require that the

*differences*in the population have a normal distribution.

**7.**The stem-and-leaf plot of the

*sample*doesn't suggest the

*population*is non-normal.

**8.**Since the CI includes zero, possibly the population mean difference could be zero.

### A.7.5 Answers to Sect. 7.10

**1.**\(\mu_d\) is the mean difference in the target population; \(\bar{d}\) is the mean difference in this sample.

**2.**Each plasma \(\beta\) measurement is measured after and before on the same runner.

**3.**The

*direction*of the difference should be clearly stated.

**4.**\(\bar{d} = 18.736\); \(s_d=8.3297\)

**5.**\(\text{s.e.}(\bar{d}) = s_d/\sqrt{n} = 8.3297/\sqrt{11} = 2.5115\). This is the standard deviation of the sample mean difference, a measurement of how precisely the sample mean difference measures the population mean difference.

**6.**Almost impossible. The sample means would vary every time we took a sample, around the true mean difference with a normal distribution having a standard error of about \(2.51\). Since we don't know the population mean, the best we can say is that the sample mean will vary about our best guess of the population mean; in other words, the sample means will vary around \(8.33\) with a standard deviation of about \(2.51\).

**7.**\(18.736\pm (2\times 2.5115)\), or \(18.736\pm5.023\), or from \(13.7\) to \(23.8\) pmol/litre.

**8.**We are 95% confident that the population \(\beta\) plasma concentration

*increases*by a mean amount between \(13.7\) and \(23.8\) pmol/litre, during the fun run.

**9.**Either (or both) of these must be true: the population has a normal distribution, and/or the sample size is large enough so that the sample means have a normal distribution, so about larger than 25.

**10.**Since \(n<25\), we need to assume the population of differences has a normal distribution. The stem-and-leaf plot suggests this is not unreasonable, so the sample means quite possibly have an approx. normal distribution:

**11.**Looks likely that the plasma \(\beta\) concentrations are higher after the race.

**12.**\(n=(2\times 8.3297\div 2.5)^2 = 43.45\), so need data from \(44\) runners.