1.2 Visualising the data and checking assumptions: Independent samples \(t\)-test

At this point, it is a good idea to visualise the data and look at some basic descriptive statistics. This can give us an idea of what we may expect when we carry out the hypothesis test, and also help us check the assumptions.

First of all, let's take a look at some boxplots, and also the sample size, sample mean, and standard deviations (SD's) for both groups:

Table 1.1: Sample size, mean and standard deviation of cholesterol levels for high and low risk groups
Sample size Mean SD
High risk 36 5.46 0.42
Low risk 36 4.8 0.31

From the above, we can observe the following:

  1. The boxplots and sample means indicate that the average cholesterol looks different between groups. When we carry out the \(t\)-test, we will see whether or not this difference is statistically significant
  2. From the boxplots, the data appear to be similarly spread out. The SD's are also similar to each other (neither one is double the other). This indicates the equal variances assumption has not been violated.
  3. The sample size in both groups is 36. This will be useful knowledge later when checking for normality.

Using the Levene's test to test for equal variances

We can also use a hypothesis test called the Levene's test for equality of variances to help determine whether or not the equal variances assumption has been met. Consider the following null and alternative hypotheses:

  • \(H_0 : \text{The groups have equal variances}\)
  • \(H_1 : \text{The groups do not have equal variances}.\)

Since we start out by assuming the groups have equal variances, the test tells us to only reject this assumption if we get a small \(p\)-value. That is, a small \(p\)-value indicates the groups do not have equal variances. To summarise:

Levene's test for equality of variances:

  • If \(p\) < 0.05, equal variances cannot be assumed
  • If \(p\) > 0.05, equal variances can be assumed

Let's carry out the Levene's test for the cholesterol data:

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1   0.545 0.4628
      70               

As we can see, we have \(p = 0.4628\). Since \(p > 0.05\), equal variances can be assumed. Given our observations from the box plots and standard deviations, this is not a surprising result.

What happens if the equal variances assumption has been violated?

There are two versions of the independent samples \(t\)-test. So, if equal variances can be assumed, we use the version of the \(t\)-test that assumes equal variances; if equal variances cannot be assumed, we use the version of the \(t\)-test that does NOT assume equal variances. We will have a chance to practise this in the computer lab.

Your turn

Given our analysis above, which version of the independent samples \(t\)-test should we use for the cholesterol example?

the \(t\)-test that assumes equal variances

Checking for normality

When checking for normality for the independent samples \(t\)-test, this needs to be done for both groups individually. Let's take a look at the histograms, Normal Q-Q plots, and Shapiro-Wilk test results:

Shapiro-Wilk test for high risk group:


    Shapiro-Wilk normality test

data:  heartattack$cholesterol[heartattack$risk == "high"]
W = 0.95429, p-value = 0.1427

Shapiro-Wilk test for low risk group:


    Shapiro-Wilk normality test

data:  heartattack$cholesterol[heartattack$risk == "low"]
W = 0.96931, p-value = 0.407

Considering the histograms, Normal Q-Q plots, Shapiro-Wilk test results, and sample size (\(n = 36\) for both groups), we can safely conclude the normality assumption has been met.

We are now ready to carry out the independent samples \(t\)-test.