22 Day 22

Announcements

That’s all folks

This is the last new content we’ll cover
From here on, it’s review, review, review
Wednesday we’ll do a refresh of some of the stuff we probably completely forgot
Thanksgiving break is next week

Some of you will get the TEVAL soon (we administer them out semi-randomly)

Review

Null Hypothesis $H_0$

The statement we are holding as known and established information

i.e., The average body weight of an adult cat is $10$ lbs.

$H_0:\mu=10$

Alternate Hypothesis $H_a$ or $H_1$

The statement we are testing to determine the accuracy of

I believe that the cats I interact with regularly have a different average body weight than the population

$H_a:\mu \neq 10$

Test Statistic $t^*$

A value calculated as part of the hypothesis testing process. We place it into a $t$ -table (or $z$ -table depending) to get a $p$ -value.

$t^* = \frac{\bar{x} - \mu_0}{{s}/{\sqrt{n}}}$

I weighed $4$ of my friends cats and my own cat and found that their average body weight was $8$ pounds, with a standard deviation of $2.49$

$t^* = \frac{8 - 10}{{2.49}/{\sqrt{5}}}$

$t^*=-1.796039$

A reminder of our key study participant:

Significance level $\alpha$

The percentage probability we incur Type 1 Error in our hypothesis testing process

I want to test my cat weight hypothesis at $\alpha=0.05$

P-value

The final statistic calculated in a hypothesis test, used to determine if we reject or fail to reject the null hypothesis

$2*P(T>t^*)=0.15$

$0.15>\alpha \quad \text{Fail to Reject} \ H_0$

Statistically Significant

We refer to a result as statistically significant if we tested it against a null hypothesis and proceeded to reject the null hypothesis

There is insufficient evidence to suggest that the body weight of the cats that interact with regularly have a statistically significant difference in average body weight from the population

Hypothesis Tests for Difference Between Two Means (Independent)

We’ve covered hypothesis testing for a single population parameter
- (e.g., population mean $\mu$ )

Let’s look at testing a claim about the difference between two population means

$\mu_1-\mu_2$

We need two independent random samples from two distinct populations
- Independence implies that $X$ and $Y$ have no effect on one another

As with everything we do in this class, we need to confirm our sample can be assumed as approximately normal

$n>30$

We want to see if population means $\mu_1$ and $\mu_2$ are equal:

$H_0:\mu_1=\mu_2$

There are three possibile alternate hypotheses:
- Left-tailed: $H_1:\mu_1<\mu_2$
- Right-tailed: $H_1:\mu_1>\mu_2$
- Two-tailed: $H_1:\mu_1\neq\mu_2$

We need a test statistic, $t^*$ :

$t^*=\frac{(\bar{x}_1-\bar{x}_2)-(\mu_1-\mu_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}$

Under $H_0$ , $\mu_1=\mu_2$ , so $\mu_1-\mu_2=0$

$t=\frac{(\bar{x}_1-\bar{x}_2)-0}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}$

$\mu_1$ , $\mu_2$ are population means (under the assumption that $H_0$ is true)
$\bar{x}_1$ , $\bar{x}_2$ are sample means
$s_1$ , $s_2$ are sample standard deviations
$n_1$ , $n_2$ are sample sizes

The test statistic measures how large the sample mean difference $(\bar{x}_1-\bar{x}_2)$ differs from the hypothesized value $\mu_1-\mu_2$ in $H_0$
The test statistic comes from a Student’s $t$ distribution with degrees of freedom:

$df=\min(n_1-1,n_2-1)$
- (i.e., the smaller of $n_1-1$ and $n_2-1$ ).

For the P-value calculation:

Left-tailed: $H_1:\mu_1<\mu_2$

$\text{P-value}=P(T<t)$
Right-tailed: $H_1:\mu_1>\mu_2$

$\text{P-value}=P(T>t)$
Two-tailed: $H_1:\mu_1\neq\mu_2$

$\text{P-value}=2\cdot P(T<-|t|) \text{ OR } 2\cdot P(T>|t|)$

The steps for this hypothesis test are:

State the null and alternate hypotheses
Choose a significance level $\alpha$
Compute the test statistic:

$t=\frac{(\bar{x}_1-\bar{x}_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}$
Compute the P-value of the test statistic $t$

Left-tailed: $\text{P-value}=P(T<t)$
Right-tailed: $\text{P-value}=P(T>t)$
Two-tailed: $\text{P-value}=2\cdot P(T<-|t|)$ or $2\cdot P(T>|t|)$

Note: The degrees of freedom of the $t$ distribution is: $df=\min(n_1-1,n_2-1)$

Determine whether to reject $H_0$ :
- Reject $H_0$ if $\text{P-value} \leq \alpha$ .
State a conclusion

Example 1

The National Assessment Educational Progress tested a sample of students who had used a computer in their mathematics classes, and another sample of students who had not used a computer. The sample mean score for students using a computer was 309, with a sample standard deviation of 29. For students not using a computer, the sample mean was 303, with a sample standard deviation of 32. Assume there were 60 students in the computer sample and 40 students in the sample that hadn’t used a computer.

At 5% significance level, conduct a hypothesis test to determine whether the population mean scores differ in the between those students who use a computer and those who do not.

Step 1. State the null and alternative hypotheses

$H_0: \mu_1 = \mu_2$

$H_A: \mu_1 \neq \mu_2 \quad (\rightarrow \text{two-tailed})$

Step 2. The significance level is $\alpha=0.05$

Step 3. Compute the test statistic

$\begin{array}{|c|c|c|c|} \hline \text{} & \text{Sample Mean} & \text{Sample Std. Dev.} & \text{Sample Size} \\ \hline \text{With Computer} & \bar{x}_1 = 309 & s_1 = 29 & n_1 = 60 \\ \hline \text{Without Computer} & \bar{x}_2 = 303 & s_2 = 32 & n_2 = 40 \\ \hline \end{array}$

$t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$

$t = \frac{(309 - 303) - (0)}{\sqrt{\frac{29^2}{60} + \frac{32^2}{40}}} \approx 0.953$

Step 4. Compute the P-value

We use the t-table with $df=\min(n_1-1, n_2-1)=39$ . Then:

$P(T>0.953) \text{ is between } P(T>1.304)=0.10 \text{ and } P(T>0.681)=0.25$

For the two-tailed test, the P-value:

$\text{P-value} = 2 \cdot P(T>0.953) \text{ is between } 0.20 \text{ and } 0.50$

Step 5. Determine whether to reject $H_0$

Since the P-value $> \alpha = 0.05$ , we fail to reject $H_0$

Step 6. State a conclusion

There is not enough evidence to conclude that the mean scores differ between those students who use a computer and those who do not (i.e., the mean scores may be the same)

Hypothesis Tests for Difference Between Two Means (Paired)

Next we turn our attention to a hypothesis test for paired (or matched) samples

Example: Gas mileage before and after tune-up for automobiles

$\begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \text{Automobile} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \text{After Tune-up} & 35.44 & 35.17 & 31.07 & 31.57 & 26.48 & 23.11 & 25.18 & 32.39 \\ \hline \text{Before Tune-up} & 33.76 & 34.30 & 29.55 & 30.90 & 24.92 & 21.78 & 24.30 & 31.25 \\ \hline \end{array}$

Both mileages before and after tune-up are obtained from the same automobile (i.e., the values are paired within the subject)

-Now, we are interested in testing the population mean difference for the matched pairs

-Our hypothesis test will involve two paired random samples from a single population

-The set of differences between the values in the matched pairs is considered as the sample data

-Required assumption: Each sample size is large (n > 30), or the differences in the matched pairs are normally distributed (at least approximately)

The population mean difference for the matched pairs is denoted $\mu_d$ (unknown value)
The sample mean of the differences is denoted $\bar{d}$
The sample std. deviation of the differences is denoted $s_d$

$\mu_d = \text{the mean mileage difference before and after tune-up}$

$\bar{d} = \frac{1.68 + 0.87 + \ldots + 1.14}{8} \approx 1.2063$

$s_d = \sqrt{\frac{(1.68 - 1.206)^2 + \ldots + (1.14 - 1.206)^2}{7}} \approx 0.3732$

Step 1. State the null and alternate hypotheses. The null hypothesis is of the form

$H_0: \mu_d = \mu_0$

where $\mu_0$ is a prespecified value (e.g. $\mu_0 = 0$ is most common)

The alternate hypothesis:

Left-tailed: $H_1: \mu_d < \mu_0$
Right-tailed: $H_1: \mu_d > \mu_0$
Two-tailed: $H_1: \mu_d \neq \mu_0$

Step 2. Choose a significance level $\alpha$

Step 3. Compute the test statistic:

$t = \frac{\bar{d} - \mu_0}{s_d / \sqrt{n}}$

which follows a Student’s $t$ distribution with $df = n - 1$

Step 4. Compute the P-value of the test statistic $t$

Left-tailed: P-value = area under the Student’s $t$ distribution to the left of $t$ , i.e., $P(T < t)$
Right-tailed: P-value = area under the Student’s $t$ distribution to the right of $t$ , i.e., $P(T > t)$
Two-tailed: P-value = sum of the areas under the Student’s $t$ distribution to the left of $-|t|$ and right of $|t|$ , i.e., $2 \cdot P(T < -|t|)$ or $2 \cdot P(T > |t|)$

Step 5. Determine whether to reject $H_0$ :

Reject $H_0$ if P-value $\leq \alpha$

Step 6. State a conclusion

Go away