Chapter6 Differences between means
6.1 Independent Samples t-Test when Sample Sizes are Equal
6.1.1 🧠💪Refresher + Worked example💪🧠
The purpose of an independent samples \(t\)-test is to see whether there is a significant difference between two sample means.
Here is the formula for an independent samples t-test (when sample sizes are equal):
\(t = \frac{\bar{x}_1 + \bar{x}_2}{SEDBM}\)
where \(\bar{x}_1\) is the sample mean of group 1, \(\bar{x}_2\) is the sample mean of group 2, and \(SEDBM\) is the Standard Error of the Difference Between Means. The SEDBM is a measure of how much variability we would expect in the difference between sample means if we repeated the experiment many times.
\(SEDBM = \sqrt{\frac{s^2_1}{N_1} + \frac{s^2_2}{N_2}}\)
where \(s^2_1\) is the sample variance for group 1 and \(N_1\) is the sample size of group 1. \(s^2_2\) and \(N_2\) are the same for group 2. Recall that the variance is just the standard deviation squared.
Let’s walk through an example:
Let’s say we are testing whether or not a particular advertising format (Banner Ad or Video Ad) results in a difference between sales for a clothing company. The banner advertising resulted in an average of 25 (\(SD = 3.22\)) sales across a 30-day period. The video ad resulted in an average of 23 (\(SD = 3.19\)) sales in the same 30-day period. The marketing department gave you the task of determining whether or not a particular advertising format resulted in a significantly different number of sales.
Step 1: State null and alternative hypotheses
\(H_0: \bar{x_1} - \bar{x_2} = 0\) There is no significant difference between the two means (banner or video)
\(H_1: \bar{x_1} - \bar{x_2} \neq 0\) There is a significant diiference between the two means (banner or video)
Step 2: Figure out what you have and note it down
\(\bar{x_1} = 25.00\) and \(s_1 = 3.22\)
\(\bar{x_2} = 23.00\) and \(s_2 = 3.19\)
\(n_1 = 30\)
\(n_2 = 30\)
Step 3: Calculate degrees of freedom
Degrees of freedom for an independent samples t-test is the total sample size (including both groups) minus 2
\(N = 30 + 30\)
\(N = 60\)
\(df = N-2\)
\(df = 58\)
Step 4: Calculate the SEDBM
\(SEDBM = \sqrt{\frac{3.22^2}{30} + \frac{3.19^2}{30}}\)
\(SEDBM = \sqrt{\frac{10.37}{30} + \frac{10.18}{30}}\)
\(SEDBM = \sqrt{0.35 + 0.34}\)
\(SEDBM = \sqrt{0.69}\)
\(SEDBM = 0.83\)
Step 5: Calculate the difference between mean 1 and mean 2
\(\bar{x_1}-\bar{x_2}\)
\(25-23 = 2\)
Step 6: Divide the difference between the means by the SEDBM to calculate your test statistic, \(t\).
\(t = \frac{2}{0.83}\)
\(t = 2.41\)
Hand calculations can differ because of rounding. If you did this by hand while following along and its generally close (within a few decimals), you probably did it correctly.
Normally, at this point, we would seek to determine whether a t-value of 2.41 with 58 degrees of freedom is statistically significant. Historically, people either did this by looking up “critical values” in a table of numbers 🤮. Or, a stats program like JASP or R will just give you the p-value directly. We’re not going to worry about either for now.
6.1.2 📝Homework problems📝
- Two classes rated different quiz platforms on a 1–10 satisfaction scale.
- Moodle: \(M = 6.91\), \(SD = 2.11\), \(n = 30\)
- TopHat: \(M = 8.23\), \(SD = 2.19\), \(n = 30\)
Calculate a t-value and degrees of freedom for these data.
- Two instructors use different teaching styles. After one week of instruction, their students take the same 30-point quiz.
- Class A: \(M = 25.00\), \(SD = 4.36\), \(n = 25\)
- Class B: \(M = 23.00\), \(SD = 4.22\), \(n = 25\)
Calculate a t-value and degrees of freedom for these data.
- A teacher compares two notebook types by surveying two English classes (handwriting-heavy).
- Class B (Swiss-Bound): \(M = 7.37\), \(SD = 1.25\), \(n = 20\)
- Class A (Spiral-Bound): \(M = 6.23\), \(SD = 1.52\), \(n = 20\)
Calculate the t-value and degrees of freedom.
Two classes were compared on anxiety scores. The difference between means was 1.8 points, and the t-value was 2.25. The standard error of the difference between means is missing. Can you calculate it?
A t-value of 2.11 was computed for the difference between two sample means.
The observed difference between means was 3.6. What was the standard error of the difference between means?You are told that the SEDBM is 0.71, and that the mean of Class A is 24.6 while Class B is 26.3.
What is the t-value?The reported t-value is 2.00 and the SEDBM is 0.50.
What is the raw difference between sample means?
6.2 One-way, independent Samples ANOVA
6.2.1 🧠💪Refresher + Worked example💪🧠
Due to family-wise error rates, we cannot conduct several t-tests on the same data. If we have three or more groups, we use an ANOVA to circumvent this. An ANOVA tells us if there is a significant difference somewhere within our three (or more) group means. It doesn’t tell us where. If there is a significant difference found, we conduct further post-hoc analyses to see where the difference occurs.
6.2.1.1 Example
A small business is designing a new branding strategy. They want to see if there are any existing differences in the engagement they receive from advertisements on three different social media platforms: Facebook, Twitter, and Instagram. They use the same advertising and promotional schedule across all three platforms and track the number of sales generated using a survey question on Shopify: “Where did you hear about us?”
8 | 5 | 7 |
10 | 6 | 9 |
6 | 4 | 5 |
9 | 5 | 8 |
7 | 6 | 6 |
Preliminary data analysis revealed descriptive statistics for Facebook (\(M = 8.00, SD = 1.58\)), Instagram (\(M = 7.00, SD = 1.41\)), and Twitter (\(M = 5.20, SD = 0.84\)). The overall average (AKA the “grand mean”) of sales in the advertising campaign was 6.73 per day.
6.2.1.2 Step 1: State null and alternative hypotheses
\(H_0\): There is no difference in sales based on the social media platform used for the ad campaign
\(H_1\): There is a significant difference in sales based on the social media platform used for the ad campaign
6.2.1.3 Step 2: Figure out what you have and note it down.
For ANOVAs by hand, I usually use a column on the right side of the paper out of the way of the calculations to not get lost. Also I refer to each level (Facebook, Instagram, and Twitter in this case) as \(\bar{x}_1\), \(\bar{x}_2\), \(\bar{x}_3\), etc. Same for standard deviations with \(s_1\) and so on. Keeping track of everything when hand calculating ANOVAs is super important. Use consistent rounding, otherwise it may throw off your answer. When hand calculating I generally round to 3 decimals all the way through and round my final answer to 2 decimals in accordance with APA style.
\(\bar{x}_{Grand} = 6.73\)
\(n = 5\) and \(N = 15\)
\(\bar{x}_1 = 8.00\), \(\bar{x}_2= 7.00\), \(\bar{x}_3 = 5.20\)
\(s_1 = 1.58\), \(s_2= 1.41\), \(s_3 = 0.84\)
6.2.1.4 Step 3: Draw an ANOVA Table on the page somewhere.
As we walk through each step, place the resulting value in the table. The ANOVA table should look like this:
Source | SS | df | MS | F |
---|---|---|---|---|
Model | ||||
Residual | ||||
Total |
6.2.1.5 Step 4: Calculate each Sum of Squares
\(SS_{Total} = \Sigma(x_ - \bar{x}_{Grand})^2\)
Take every individual score \((x)\), subtract it from the grand mean. This is called a deviation. Square all the deviations and add them all together. The worked out formula should look like this:
\(SS_{Total} = (8-6.73)^2+(10-6.73)^2+(6-6.73)^2+(9-6.73)^2\)
\(+ (7-6.73)^2 + (5-6.73)^2 + (6-6.73)^2 + (4-6.73)^2 + (5-6.73)^2\)
\(+ (6-6.73)^2 + (7-6.73)^2 + (9-6.73)^2 + (5-6.73)^2 + (8-6.73)^2 + (6-6.73)^2\)
\(SS_{Total} = 42.93\)
Next, we calculate \(SS_{Model}\), the degree to which your model (Independent variable) can account for the variance in your independent variable:
\(SS_{Model} = \Sigma n_k(\bar{x}_k - \bar{x}_{Grand})^2\)
Take the mean of each group and subtract it from the grand mean. Each of these is a deviation like before. But instead of “deviations of all data from the grand mean” we’re doing “deviations of all sample means from the grand mean”. Next, we square each of these deviations, multiply each of these squared deviations by the sample size of their respective groups, and then add these all together.
\(SS_{Model} = 5(8-6.73)^2 +5(7-6.73)^2 + 5(5.20-6.73)^2\)
\(SS_{Model} = 20.13\)
Lastly, we calculate \(SS_{Residual}\). To calculate this one by itself is kind of tedious. Thankfully, though, if you have 2 of the Sum of Squares, you can always infer the third one becuase \(SS_{Total} = SS_{Model} + SS_{Total}\). Just do some re-arranging and plug in what you already know:
\(SS_{Residual} = SS_{Total} - SS_{Model}\)
\(SS_{Residual} = 42.93 - 20.13\)
\(SS_{Residual} = 22.80\)
Then, fill each Sum of Squares in the ANOVA table
Source | SS | df | MS | F |
---|---|---|---|---|
Model | 20.13 | |||
Residual | 22.80 | |||
Total | 42.93 |
6.2.1.6 Step 5: Calculate degrees of freedom
\(df_{Model} = k -1\) where \(k\) is the number of groups
\(df_{Residual} = \Sigma(n-1)\) where \(n\) is the number of observations within each group
For \(df_{Model}\), see the calculation below
\(df_{Model} = 3 - 1\)
\(df_{Model} = 2\)
For \(df_{Residual}\), subtract 1 from the number of observations you have in each group
\(df_{Residual} = (5-1) + (5-1) + (5-1)\)
\(df_{Residual} = 4 + 4 + 4\)
\(df_{Residual} = 12\)
Personally, I find it easier and simpler to remember that, just as \(SS_{Total} = SS_{Model} + SS_{Total}\) so does \(df_{Total} = df_{Model} + df_{Total}\). \(df_{Total}\) is just the total sample size minus 1. Thus, it’s pretty easy to figure out “Number of groups minus one” and “Number of observations minus one”, then figure out that last piece of information by using \(df_{Residual} = df_{Total} + df_{Model}\)
Your ANOVA table should look like this now:
Source | SS | df | MS | F |
---|---|---|---|---|
Model | 20.13 | 2 | ||
Residual | 22.80 | 12 | ||
Total | 42.93 | 14 |
6.2.1.7 Step 6: Calculate the Mean Squares (MS)
Calculating each of your mean squares is pretty easy. Just divide each sum of squares by its respective degrees of freedom
For \(MS_{Model}\):
\(MS_{Model} = \frac{SS_{Model}}{df_{Model}}\)
\(MS_{Model} = \frac{20.13}{2}\)
\(MS_{Model} = 10.065\)
For \(MS_{Residual}\):
\(MS_{Residual} = \frac{SS_{Residual}}{df_{Residual}}\)
\(MS_{Residual} = \frac{22.80}{12}\)
\(MS_{Residual} = 1.90\)
Source | SS | df | MS | F |
---|---|---|---|---|
Model | 20.13 | 2 | 10.07 | |
Residual | 22.80 | 12 | 1.90 | |
Total | 42.93 | 14 |
6.2.1.8 Step 7
Calculate your \(F\)*statistic.
\(F = \frac{MS_{Model}}{MS_{Residual}}\)
\(F = \frac{10.07}{1.90}\)
\(F = 5.30\)
Source | SS | df | MS | F |
---|---|---|---|---|
Model | 20.13 | 2 | 10.07 | 5.30 |
Residual | 22.80 | 12 | 1.90 | |
Total | 42.93 | 14 |
As far as hand calculations, this is where our work ends. Normally, though, you’d want to know if this F-value is statistically significant. You can either look up a critical F-value for these degrees of freedom 🤮 or have a computer give you the p-value directly. In APA style, we’d write up the results as “F(2, 12) = 5.30” along with the p-value. The parentheses after the F have the numerator (Model) and denominator (Residual) degrees of freedom, in that order.
6.2.2 📝Homework problems📝
- You have three groups with the following sample means and observations:
Group | Observations | Sample Mean |
---|---|---|
A | 4, 5, 6 | 5 |
B | 2, 3, 3 | 2.67 |
C | 6, 7, 8 | 7 |
The grand mean across all observations is 4.89.
Calculate:
- The Sum of Squares Between Groups (SS Model)
- The Sum of Squares Within Groups (SS Residual)
- Three groups each have 3 observations:
Group | Observations | Sample Mean |
---|---|---|
X | 10, 12, 13 | 11.67 |
Y | 8, 9, 7 | 8 |
Z | 14, 15, 13 | 14 |
The grand mean is 11.22.
Calculate:
- SS Model (between groups)
- SS Residual (within groups)
- Given the following partial ANOVA table, fill in the missing degrees of freedom:
Source | SS | df | MS | F |
---|---|---|---|---|
Model | 24.5 | 8.17 | 4.56 | |
Residual | 42.6 | 15 | 2.84 | |
Total | 67.1 |
What are the missing values of df for Model and Total?
- Fill in the missing Mean Square (MS) and degrees of freedom in the ANOVA table below:
Source | SS | df | MS | F |
---|---|---|---|---|
Model | 18.3 | 3 | 6.10 | |
Residual | 20 | 2.52 | ||
Total | 52.8 |
- An experiment has 4 groups, each with 5 observations (total N=20).
The total sum of squares (SS Total) is 90. The residual sum of squares (SS Residual) is 30.
Fill in this ANOVA table:
Source | SS | df | MS | F |
---|---|---|---|---|
Model | ||||
Residual | ||||
Total |