21 Day 22
Announcements
That’s all folks
This is the last new content we’ll cover
From here on, it’s review, review, review
Wednesday we’ll do a refresh of some of the stuff we probably completely forgot
Thanksgiving break is next week
Some of you will get the TEVAL soon (we administer them out semi-randomly)
Review
Null Hypothesis \(H_0\)
The statement we are holding as known and established information
- i.e., The average body weight of an adult cat is \(10\) lbs.
\[H_0:\mu=10\]
Alternate Hypothesis \(H_a\) or \(H_1\)
The statement we are testing to determine the accuracy of
- I believe that the cats I interact with regularly have a different average body weight than the population
\[H_a:\mu \neq 10\]
Test Statistic \(t^*\)
A value calculated as part of the hypothesis testing process. We place it into a \(t\)-table (or \(z\)-table depending) to get a \(p\)-value.
\[t^* = \frac{\bar{x} - \mu_0}{{s}/{\sqrt{n}}}\]
- I weighed \(4\) of my friends cats and my own cat and found that their average body weight was \(8\) pounds, with a standard deviation of \(2.49\)
\[t^* = \frac{8 - 10}{{2.49}/{\sqrt{5}}}\]
\[t^*=-1.796039\]
A reminder of our key study participant:
Significance level \(\alpha\)
The percentage probability we incur Type 1 Error in our hypothesis testing process
- I want to test my cat weight hypothesis at \(\alpha=0.05\)
P-value
The final statistic calculated in a hypothesis test, used to determine if we reject or fail to reject the null hypothesis
\[2*P(T>t^*)=0.15\]
\[0.15>\alpha \quad \text{Fail to Reject} \ H_0\]
Statistically Significant
We refer to a result as statistically significant if we tested it against a null hypothesis and proceeded to reject the null hypothesis
- There is insufficient evidence to suggest that the body weight of the cats that interact with regularly have a statistically significant difference in average body weight from the population
Hypothesis Tests for Difference Between Two Means (Independent)
We’ve covered hypothesis testing for a single population parameter
- (e.g., population mean \(\mu\))
- Let’s look at testing a claim about the difference between two population means
\[\mu_1-\mu_2\]
We need two independent random samples from two distinct populations
- Independence implies that \(X\) and \(Y\) have no effect on one another
- As with everything we do in this class, we need to confirm our sample can be assumed as approximately normal
\[n>30\]
- We want to see if population means \(\mu_1\) and \(\mu_2\) are equal:
\[H_0:\mu_1=\mu_2\]
There are three possibile alternate hypotheses:
Left-tailed: \(H_1:\mu_1<\mu_2\)
Right-tailed: \(H_1:\mu_1>\mu_2\)
Two-tailed: \(H_1:\mu_1\neq\mu_2\)
We need a test statistic, \(t^*\):
\[t^*=\frac{(\bar{x}_1-\bar{x}_2)-(\mu_1-\mu_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]
Under \(H_0\), \(\mu_1=\mu_2\), so \(\mu_1-\mu_2=0\)
\[t=\frac{(\bar{x}_1-\bar{x}_2)-0}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]
\(\mu_1\), \(\mu_2\) are population means (under the assumption that \(H_0\) is true)
\(\bar{x}_1\), \(\bar{x}_2\) are sample means
\(s_1\), \(s_2\) are sample standard deviations
\(n_1\), \(n_2\) are sample sizes
The test statistic measures how large the sample mean difference \((\bar{x}_1-\bar{x}_2)\) differs from the hypothesized value \(\mu_1-\mu_2\) in \(H_0\)
The test statistic comes from a Student’s \(t\) distribution with degrees of freedom:
\[df=\min(n_1-1,n_2-1)\]
- (i.e., the smaller of \(n_1-1\) and \(n_2-1\)).
For the P-value calculation:
Left-tailed: \(H_1:\mu_1<\mu_2\)
\[\text{P-value}=P(T<t)\]
Right-tailed: \(H_1:\mu_1>\mu_2\)
\[\text{P-value}=P(T>t)\]
Two-tailed: \(H_1:\mu_1\neq\mu_2\)
\[\text{P-value}=2\cdot P(T<-|t|) \text{ OR } 2\cdot P(T>|t|)\]
The steps for this hypothesis test are:
State the null and alternate hypotheses
Choose a significance level \(\alpha\)
Compute the test statistic:
\[t=\frac{(\bar{x}_1-\bar{x}_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]
Compute the P-value of the test statistic \(t\)
Left-tailed: \(\text{P-value}=P(T<t)\)
Right-tailed: \(\text{P-value}=P(T>t)\)
Two-tailed: \(\text{P-value}=2\cdot P(T<-|t|)\) or \(2\cdot P(T>|t|)\)
Note: The degrees of freedom of the \(t\) distribution is: \[df=\min(n_1-1,n_2-1)\]
- Determine whether to reject \(H_0\):
- Reject \(H_0\) if \(\text{P-value} \leq \alpha\).
- State a conclusion
Example 1
The National Assessment Educational Progress tested a sample of students who had used a computer in their mathematics classes, and another sample of students who had not used a computer. The sample mean score for students using a computer was 309, with a sample standard deviation of 29. For students not using a computer, the sample mean was 303, with a sample standard deviation of 32. Assume there were 60 students in the computer sample and 40 students in the sample that hadn’t used a computer.
At 5% significance level, conduct a hypothesis test to determine whether the population mean scores differ in the between those students who use a computer and those who do not.
Step 1. State the null and alternative hypotheses
\[H_0: \mu_1 = \mu_2\]
\[H_A: \mu_1 \neq \mu_2 \quad (\rightarrow \text{two-tailed})\]
Step 2. The significance level is \(\alpha=0.05\)
Step 3. Compute the test statistic
\[ \begin{array}{|c|c|c|c|} \hline \text{} & \text{Sample Mean} & \text{Sample Std. Dev.} & \text{Sample Size} \\ \hline \text{With Computer} & \bar{x}_1 = 309 & s_1 = 29 & n_1 = 60 \\ \hline \text{Without Computer} & \bar{x}_2 = 303 & s_2 = 32 & n_2 = 40 \\ \hline \end{array} \]
\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]
\[t = \frac{(309 - 303) - (0)}{\sqrt{\frac{29^2}{60} + \frac{32^2}{40}}} \approx 0.953\]
Step 4. Compute the P-value
We use the t-table with \(df=\min(n_1-1, n_2-1)=39\). Then:
\[P(T>0.953) \text{ is between } P(T>1.304)=0.10 \text{ and } P(T>0.681)=0.25\]
For the two-tailed test, the P-value:
\[\text{P-value} = 2 \cdot P(T>0.953) \text{ is between } 0.20 \text{ and } 0.50\]
Step 5. Determine whether to reject \(H_0\)
Since the P-value \(> \alpha = 0.05\), we fail to reject \(H_0\)
Step 6. State a conclusion
There is not enough evidence to conclude that the mean scores differ between those students who use a computer and those who do not (i.e., the mean scores may be the same)
Hypothesis Tests for Difference Between Two Means (Paired)
- Next we turn our attention to a hypothesis test for paired (or matched) samples
- Example: Gas mileage before and after tune-up for automobiles
\[ \begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \text{Automobile} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \text{After Tune-up} & 35.44 & 35.17 & 31.07 & 31.57 & 26.48 & 23.11 & 25.18 & 32.39 \\ \hline \text{Before Tune-up} & 33.76 & 34.30 & 29.55 & 30.90 & 24.92 & 21.78 & 24.30 & 31.25 \\ \hline \end{array} \]
- Both mileages before and after tune-up are obtained from the same automobile (i.e., the values are paired within the subject)
-Now, we are interested in testing the population mean difference for the matched pairs
-Our hypothesis test will involve two paired random samples from a single population
-The set of differences between the values in the matched pairs is considered as the sample data
-Required assumption: Each sample size is large (n > 30), or the differences in the matched pairs are normally distributed (at least approximately)
The population mean difference for the matched pairs is denoted \(\mu_d\) (unknown value)
The sample mean of the differences is denoted \(\bar{d}\)
The sample std. deviation of the differences is denoted \(s_d\)
\[\mu_d = \text{the mean mileage difference before and after tune-up}\]
\[\bar{d} = \frac{1.68 + 0.87 + \ldots + 1.14}{8} \approx 1.2063\]
\[s_d = \sqrt{\frac{(1.68 - 1.206)^2 + \ldots + (1.14 - 1.206)^2}{7}} \approx 0.3732\]
Step 1. State the null and alternate hypotheses. The null hypothesis is of the form
\[H_0: \mu_d = \mu_0\]
where \(\mu_0\) is a prespecified value (e.g. \(\mu_0 = 0\) is most common)
The alternate hypothesis:
Left-tailed: \(H_1: \mu_d < \mu_0\)
Right-tailed: \(H_1: \mu_d > \mu_0\)
Two-tailed: \(H_1: \mu_d \neq \mu_0\)
Step 2. Choose a significance level \(\alpha\)
Step 3. Compute the test statistic:
\[t = \frac{\bar{d} - \mu_0}{s_d / \sqrt{n}}\]
which follows a Student’s \(t\) distribution with \(df = n - 1\)
Step 4. Compute the P-value of the test statistic \(t\)
Left-tailed: P-value = area under the Student’s \(t\) distribution to the left of \(t\), i.e., \(P(T < t)\)
Right-tailed: P-value = area under the Student’s \(t\) distribution to the right of \(t\), i.e., \(P(T > t)\)
Two-tailed: P-value = sum of the areas under the Student’s \(t\) distribution to the left of \(-|t|\) and right of \(|t|\), i.e., \(2 \cdot P(T < -|t|)\) or \(2 \cdot P(T > |t|)\)
Step 5. Determine whether to reject \(H_0\):
- Reject \(H_0\) if P-value \(\leq \alpha\)
Step 6. State a conclusion
Go away