## A.10 Answer: TW 10 tutorial

1. The two groups are completely different; cannot determine how to pair. Indeed, the sample sizes are different.

Each rat, of course, must be in just one group (they measure lifetimes!).

It is paired: For each unit of analysis, there are two observations.

2. The null hypothesis is that there is no difference in the mean lifetimes of the two groups of rats. In symbols (where $$\mu$$ represents the mean lifetime in the population):

$$H_0$$: $$\mu_R = \mu_{FE}$$; and $$H_1$$: $$\mu_R > \mu_{FE}$$; one-tailed, because of the RQ.

3. If we took many samples of the same size from the population, the mean differences would vary from sample to sample. The 'standard error of the difference' tells us how much the sample mean will vary from sample to sample. It is is the standard deviation of this variation in sample means.

4. The variation in the sample means is small: The population means are estimated quite precisely. The means look different.

5. Either sampling variation, or the diets really are different.

6. We always use the not-equal variance (Welch's test) row: $$t=9.161$$; $$\text{df}=154.94$$; one-tailed $$P<0.0005$$ (test is one-tailed). The evidence contradicts $$H_0$$.

7. The differences are defined as the Mean restricted-diet lifetime minus the Mean Free-eating diet lifetime; we see this as the (poorly-labelled) "Mean difference" column is positive, and the only way to get this value positive is to subtract them in this order.

8. Very strong evidence exists in the sample ($$t=9.161$$ for two independent samples; $$\text{df}=154.94$$; one-tailed $$P<0.0005$$) that the population mean lifetime of rats on a restricted diet (mean lifetime: 968.75 days; std. dev.: 284.6) is greater than rats on a free-eating diet (684.01 days; 134.1) (95% CI for the difference from 223.3 to 346.1 days).

9. The two samples are independent; the sample means have a normal distribution, so: the population has a normal distribution, and/or $$n>30$$ or so.

10. The sample sizes here are quite large so we should be OK, as the Figure suggests not very severe non-normality.

11. Rats from the same litter are likely to be similar to each other. The litter would probably be the unit of analysis then, not the individual rat.

Null hypotheses:

• At Subway, is the mean length of a 12-inch sub really 12 inches?
$$\mu = 12$$
• At Subway, is the mean length of a 12-inch sub different for white and wholemeal subs?
$$\mu_{\text{white}} = \mu_{\text{wholemeal}}$$
• At Subway, is the proportion of 12-inch subs that are shorter than 12 inches different for white and wholemeal subs?
$$p_{\text{white}} = p_{\text{white}}$$
• At Subway, is the mean length of a 12-inch sub longer for white (compared to wholemeal) subs? $$\mu_{\text{white}} = \mu_{\text{wholemeal}}$$

Alternative hypotheses:

• At Subway, is the mean length of a 12-inch sub really 12 inches?
$$\mu \ne 12$$
• At Subway, is the mean length of a 12-inch sub different for white and wholemeal subs?
$$\mu_{\text{white}} \ne \mu_{\text{wholemeal}}$$
• At Subway, is the proportion of 12-inch subs that are shorter than 12 inches different for white and wholemeal subs?
$$p_{\text{white}} \ne p_{\text{white}}$$
• At Subway, is the mean length of a 12-inch sub longer for white (compared to wholemeal) subs? $$\mu_{\text{white}} > \mu_{\text{wholemeal}}$$

1. See Table A.4.

2. Using artificial limb: $$49/16 = 3.0625$$. Not using artificial limb: $$21/19 = 1.105263$$. The OR is $$3.0625/1.105263 = 2.771$$; that is, the odds of being alive after five years is almost three times higher for those using an artificial limb compared to those who do not.

See Table A.5.

3. The CI for the OR is from $$1.198$$ to $$6.411$$.

4. Chi-squared: $$5.836$$; like $$z = \sqrt{5.836/1} = 2.42$$: large; $$P$$ about $$0.016$$.

The sample provides evidence to suggest that the odds of dying within five years is not the same between having a wearing an artificial limb and the five-year mortality rate in the population ($$\text{chi-square}=5.836$$; $$\text{df}=1$$; $$P=0.016$$; OR: $$2.771$$ and 95% CI from 1.198 to 6.411).

TABLE A.4: Five-year mortality for artifical limb users
Used art. limb 49 16 65
Did not use art. limb 21 19 40
Total 70 35 105
TABLE A.5: Five-year mortality and use of an artificial limb: Numerical summary
Percentage alive after 5 years Odds alive after 5 years Sample size
Use artificial limb 75.4 3.06 65
Did not use artifical limb 52.5 1.11 40
Odds ratio 2.771 626

1. Two-sample $$t$$-test.
2. A $$\chi^2$$ test.
3. A paired $$t$$-test.
4. A two-sample $$t$$-test.
5. None of the other options are correct: requires regression or correlation.

1. Incorrect is the one about means.
2. $$113/626 = 18.05\%$$.
3. 18.05% of 47 is 8.48. See Table A.6.
4. $$25\div 88 = 0.284$$.
5. $$22\div 491 = 0.0448$$.
6. The OR is $$0.284\div 0.0448 = 6.3$$. This OR means that the odds of having Hep. C is 6.3 times greater for students who have a tattoo, compared to those who do not have a tattoo.
TABLE A.6: Five-year mortality for artifical limb users
Had Hep. C Did not have Hep. C Total
Did not have tattoo 22 491 513
Total 47 579 626

No evidence of a difference in the mean number fatalities between female- and male-named hurricanes.

The sample sizes are both much larger than 25, so neither sample needs to be normally distributed for statistical validity: The sample means will have an approximate normal distribution. None of the graphs are needed.

1. $$H_0$$: $$\mu_E = \mu_U$$ (that is, the means are the same in the two populations) and $$H_1$$: $$\mu_E\ne\mu_U$$, which can also be expressed in words. Two-tailed.
2. The precision with which the sample mean battery life estimate the population mean battery life.
3. Use the second row (though it matters little here): $$t=-0.486$$; $$\text{df}=13.1$$ and $$P=0.635$$.
4. The sample presents no evidence ($$t = -0.486$$; $$\text{df} = 13.0$$ and $$P = 0.635$$) of a difference in the mean lifetimes (mean difference: $$-0.0544$$; s.e.: $$0.112$$) of the batteries in the population. (95% CI for the difference from $$-0.30$$ to $$0.19$$mins in favour of Ultracell.)
5. Both population have a normal distribution, and/or $$n > 30$$ or so.
6. Since $$n < 30$$ for both samples, we must assume the populations both have a normal distribution. Stem-and-leaf plots aren't convincing (possible outliers) but the samples are too small to know anything for sure.

1. The third one. The first is about samples. The second is about means. $$H_0$$: $$\text{Pop. odds having byss.}_{\text{Smokers}} = \text{Pop. odds having byss.}_{\text{Non-Smokers}}$$ or $$\text{OR}=1$$; $$H_1$$: $$\text{Pop. odds having byss.}_{\text{Smokers}} > \text{Pop. odds having byss.}_{\text{Non-Smokers}}$$ or $$\text{OR}>1$$ for the OR suitably defined. (One-tailed, from RQ).
2. Smokers: $$\hat{p} = 125/3,189 = 0.0392$$; Non-smokers: $$\hat{p} = 40/2,230 = 0.0179$$.
3. For smokers: $$\text{Odds having byssinosis} = 125/3,064 = 0.0408$$.
This means: smokers are 0.041 times as likely to have byssinosis than not (or, inverting the ratio, 24.5 times as likely not to have byssinosis than have it). For non-smokers: $$\text{odds} = 40/2,190 = 0.01826$$. Non-smokers are 0.018 times as likely to have byssinosis than not (or, inverting the ratio, 54.8 times as likely not to have byssinosis than have it).
4. The OR is $$0.04080/0.01826 = 2.2$$, so the odds of a smoker having byssinosis are 2.2 times the odds of a non-smoker having byssinosis.
5. OR not one could be due to chance or because of a real difference in the population, due to smoking and/or other reasons.
6. The sample provides strong evidence (one-tailed $$P<0.001)$$; $$\text{df}=1$$; $$\text{chi-square}=20.092$$; OR: $$2.234$$ ($$95$$% CI: $$1.6$$ to $$3.2$$)) that the population proportion of smokers with byssinosis ($$\hat{p} = 0.0392$$) is greater than the population proportion of non-smokers with byssinosis ($$\hat{p} = 0.0179$$).
7. Valid, since expected counts all exceed five (which they do: no SPSS warnings given for example).

2. Using symbols, based on using proportions: $$H_0$$: $$p_{\text{EGTA}} = p_{\text{LM}}$$; $$H_1$$: $$p_{\text{EGTA}} \ne p_{\text{LM}}$$ (two-tailed).
3. The sample presents insufficient evidence ($$\text{chi-square}=2.63$$; $$\text{df}=1$$; $$P=0.104$$) that the success rates between EGTA and LM are different in the population.