# Chapter 3 Jan 20–26: Hypothesis Tests and T-Tests

**Important Request:** As you read through this week’s chapter, please note down anything that you have questions about. Then we will address these questions when we have our video calls.

Possibly useful as you read:

- On a Windows or Linux computer, to open a link in a new browser tab, hold down the
`control`

key and then click on the link. - On a Mac computer, to open a link in a new browser tab, hold down the
`command`

key and then click on the link.

## 3.1 Inference and Sampling Distributions

Begin by reading/watching the following items:

## 3.2 Hypothesis Tests

Read the following:

- Sections 1–10 (Sections 11–12 are optional) in Online Stat Book: Logic of Hypothesis Testing

### 3.2.1 One-Sample Hypothesis Tests

Watch/read the following:

- Z Tests for One Mean: An Example
- One Sample Z Test: How to Run One
- One Sample z-Test from statisticslectures.com

Optional (might be useful later):

### 3.2.2 Two-Sample Hypothesis Tests (T-Test)

Look through the following about t-tests:

- Educational Research Basics by Del Siegle: T-Tests (Siegle also provides excel spreadsheets for calculating t-tests—you can use this if you want, but it isn’t required).
- Del Siegle’s PowerPoint on t-tests
- Online Stat Book: Testing Means
- STHDA t-test guide

T-tests are used when we want to evaluate the difference between means from two independent groups. If you want to compare more than two means, you would use a different statistical test (ANOVA, which we will cover soon). We will focus on two forms of the t-test: independent samples and dependent samples t-tests.

### 3.2.3 Independent Samples T-Test

An independent samples t-test is used when you have means from two separate groups that you want to compare. For example, you might have math exam scores from some men and some women, and you want to see if there is a gender difference. Because you have two independent groups—men and women—an independent samples t-test would be appropriate. This is considered a *between-subjects design*.

The null hypothesis for a t-test is that the difference between means (of the women’s scores and men’s scores) is 0:

\[H_0 = \mu_1 - \mu_2 = 0\]

In other words, we start with the assumption that there is no difference between men and women. If we find evidence that there is a difference, we will *reject* the null hypothesis (which is the assumption that they’re the same). If we do not find any evidence that the math scores of the sampled men and the sampled women are different, we *fail to reject* the null hypothesis.

*Why don’t we just say that we “accept the null hypothesis”?* Because we don’t know for sure if the null hypothesis is true or not. All we know is that we didn’t find any evidence to say otherwise. So the null hypothesis *might* be true, but maybe there is truly a difference between the two groups and we just didn’t have enough data to detect it. This is tricky stuff to get used to at first. We can talk more about this and go through more examples together on our video calls.

If two samples come from the same population, we expect them to have equal means (although with sampling variation, they may not be exactly equal). Under the null hypothesis, we expect that the there are no differences between the groups (the experimental manipulation didn’t have an effect).

There are three assumptions of the independent samples t-tests that must be met in order to for the results to be valid and interpretable:

- Homogeneity of variances: We assume that the two groups have the same variance.
- Population is normally distributed.
- Independence of scores: Each individual (or observation) contributes only one score (data point). If they submit more than one score, then those responses (scores) are correlated with each other and therefore not independent.

### 3.2.4 Dependent Samples T-test

A dependent samples t-test is used when the two means you want to compare come from the same person. For example, you might have pre-test and post-test scores for individuals after they have undergone some sort of intervention. You want to know if their scores increased (or decreased) between from pre-test to post-test. This is a *within-subjects* design.

### 3.2.5 Errors in Hypothesis Testing

There are two types of errors that occur in hypothesis testing:

- Type I error: rejecting the null hypothesis when it’s actually true.
- Type II error: failing to reject the null hypothesis when it’s actually false.

Look through the following resources on these types of errors:

- Introduction to Type I and Type II errors – This also has a good review of hypothesis testing.
- Type I and type II errors

## 3.3 Assignment

### 3.3.1 Dance of the Means

For the first part of the homework this week, you will complete the *Dance of the Means* activity. Please download the ESCI Dance of the Means NITOP.xlsm Excel file and Guided Activity Word file.^{10} This PowerPoint file may also help you get the activity up and running.

You may have to click on `Enable Content`

or `Enable Macros`

in the Excel file to get it to work.

Please follow along in the activity guide and then answer the following questions:

**Task 1**: What is the sampling distribution of the mean?

**Task 2**: Are sample means from random samples always normally distributed around the population mean? Why or why not?

**Task 3**: What factors influence the MoE and why?

**Task 4**: When will the sampling distribution of means be “normal”?

**Task 5**: Why is the central limit theorem important?

### 3.3.2 Comparing Distributions With a T-Test

For this part of the assignment, we will use the 2 Sample T-Test tool, which I will call the *tool* throughout this section of the assignment.

Imagine that we, some researchers, are trying to answer the following research question: *How does fertilizer affect plant growth?*

We conduct a randomized controlled trial in which some plants are given a fixed amount of fertilizer (treatment group) and other plants are given no fertilizer (control group). Then we measure how much each plant grows over the course of a month. Let’s say we have ten plants in each group and we find the following amounts of growth.

The 10 plants in the control group each grew this much (each number corresponds to one plant’s growth):

3.8641111

4.1185322

2.6598828

0.3559656

2.8639095

0.9020122

5.0527020

2.3293899

3.5117162

4.3417785

The 10 plants in the treatment group each grew this much:

7.532292

1.445972

6.875600

6.518691

1.193905

4.659153

3.512655

4.578366

8.791810

4.891557

Delete the numbers that are pre-populated in the tool. Copy and paste our control data in as Sample 1 and our treatment data in as Sample 2.

**Task 6**: What is the mean and standard deviation of the control data? What is the mean and standard deviation of the treatment data? Do not calculate these by hand. The tool will tell these to you in the sample summary section.

You’ll see that the tool has drawn the distributions of the data for our treatment and control groups. That’s how you can visualize the effect size (impact) of an RCT. It has also given us a verdict at the bottom that the “Sample 2 mean is greater.” This means that this particular statistical test (a *t-test*) concludes that we are more than 95% certain that sample 1 (the control group) and sample 2 (the treatment group) are drawn from **separate populations**. In this case, the control group is sampled from the “population” of plants that didn’t get fertilizer and the treatment group is sampled from the “population” of those that did.

This process is called inference. We are making the inference, based on our 20-plant study, that in the broader population of plants, fertilizer is associated with more growth. The typical statistical threshold for inference is 95% certainty. In the difference of means section of the tool, you’ll see `p = 0.0468`

written. This is called a *p-value*. The following formula gives us the percentage of certainty we have in a statistical estimate, based on the p-value (which is written as `p`

): \(\text{Level of Certainty} = (1-p)*100\). To be 95% certain or higher, the p-value must be equal to 0.05 or lower. That’s why you will often see *p<0.05* written in studies and/or results tables.

With these particular results, our experiment found statistically significant evidence that fertilizer is associated with plant growth.

**Task 7**: What was the null hypothesis in this t-test that we conducted?

**Task 8**: What was the alternate hypothesis in this t-test that we conducted?

Now, click on the radio buttons next to ‘Sample 1 summary’ and ‘Sample 2 summary.’ This will allow you to compare different distributions to each other quickly, without having to change the numbers (raw data) above. Let’s imagine that the control group had not had as much growth as it did. Change the Sample 2 mean from 5 to 4.5.

**Task 9**: What is the new p-value of this t-test, with the new mean for Sample 2? What is the conclusion of our experiment, with these new numbers? Use the proper statistical language to write your answer.

**Task 10**: Gradually reduce the standard deviation of Sample 2 until the results are statistically significant at the 95% certainty level. What is the relationship between the standard deviation of your samples and our ability to distinguish them from each other statistically?^{11}

### 3.3.3 Logistical Tasks

**Task 11**: At the start of this chapter, I requested that you write down any questions you have. Please include them with your submitted assignment. If you don’t have any, just state this in your answer.

**Task 12**: Please upload your Week 1 assignment (from last week) to the D2L Dropbox. I learned that we are required to use D2L for this, contrary to my instructions from last week. Sorry about any confusion! We will not be using the system I described last week. The dropbox is located at `Assessments`

-> `Assignments`

-> `Dropbox for all assignments`

in D2L.

**Task 13**: Please upload this assignment that you just finished today to the very same D2L Dropbox used in the previous task. Please follow the same file naming convention that we used last week.

**Task 14**: Also e-mail your new completed assignment to me at akumar@mghihp.edu.

The source of both files is https://thenewstatistics.com/itns/esci/dance-of-the-means/.↩

Remember, when we are analyzing the data in an RCT, we are trying to figure out if the treatment and control groups had different or similar results. We are seeing if we can distinguish the two groups from each other in any way. The mean and standard deviation of the data in the two groups are the key parameters that help us tell the treatment and control groups apart, which is why you need to play around with the t-test tool to understand these relationships.↩