3.9 Comparing Two Means

Recall that we may compare two means using either a t-test or a z-test, depending on some key characteristics of the sample.

A z-test is used for large samples (the arbitrary threshold often given is >30) where the population standard deviation is known. A t-test is used for smaller sample sizes and where the population standard deviation is unknown. Both tests assume normality.

In practice, you will very rarely know the population standard deviation and you will almost always use a t-test. The t-test can in practice be used in all circumstances where a z-test can be used. As the sample size approaches infinity, the t-distribution approaches the z-distribution.

One key assumption of a standard t-test is that the population standard deviations are the same in each group. One quirk of R is that the t.test() command assumes that the variances are NOT equal in each group and will carry out a complex approximation to calculate the appropriate degrees of freedom - normally this is desired behaviour as in practice, we rarely meet the assumption of equal variance. Equality of variance can be tested using var.test().