22 Day 22

Announcements

That’s all folks

  • This is the last new content we’ll cover

  • From here on, it’s review, review, review

  • Wednesday we’ll do a refresh of some of the stuff we probably completely forgot

  • Thanksgiving break is next week

Some of you will get the TEVAL soon (we administer them out semi-randomly)


Review

Null Hypothesis \(H_0\)

The statement we are holding as known and established information

  • i.e., The average body weight of an adult cat is \(10\) lbs.

\[H_0:\mu=10\]


Alternate Hypothesis \(H_a\) or \(H_1\)

The statement we are testing to determine the accuracy of

  • I believe that the cats I interact with regularly have a different average body weight than the population

\[H_a:\mu \neq 10\]


Test Statistic \(t^*\)

A value calculated as part of the hypothesis testing process. We place it into a \(t\)-table (or \(z\)-table depending) to get a \(p\)-value.

\[t^* = \frac{\bar{x} - \mu_0}{{s}/{\sqrt{n}}}\]

  • I weighed \(4\) of my friends cats and my own cat and found that their average body weight was \(8\) pounds, with a standard deviation of \(2.49\)

\[t^* = \frac{8 - 10}{{2.49}/{\sqrt{5}}}\]

\[t^*=-1.796039\]


A reminder of our key study participant:


Significance level \(\alpha\)

The percentage probability we incur Type 1 Error in our hypothesis testing process

  • I want to test my cat weight hypothesis at \(\alpha=0.05\)


P-value

The final statistic calculated in a hypothesis test, used to determine if we reject or fail to reject the null hypothesis

\[2*P(T>t^*)=0.15\]

\[0.15>\alpha \quad \text{Fail to Reject} \ H_0\]


Statistically Significant

We refer to a result as statistically significant if we tested it against a null hypothesis and proceeded to reject the null hypothesis

  • There is insufficient evidence to suggest that the body weight of the cats that interact with regularly have a statistically significant difference in average body weight from the population



Hypothesis Tests for Difference Between Two Means (Independent)

  • We’ve covered hypothesis testing for a single population parameter

    • (e.g., population mean \(\mu\))


  • Let’s look at testing a claim about the difference between two population means

\[\mu_1-\mu_2\]

  • We need two independent random samples from two distinct populations

    • Independence implies that \(X\) and \(Y\) have no effect on one another


  • As with everything we do in this class, we need to confirm our sample can be assumed as approximately normal

\[n>30\]


  • We want to see if population means \(\mu_1\) and \(\mu_2\) are equal:

\[H_0:\mu_1=\mu_2\]

  • There are three possibile alternate hypotheses:

    • Left-tailed: \(H_1:\mu_1<\mu_2\)

    • Right-tailed: \(H_1:\mu_1>\mu_2\)

    • Two-tailed: \(H_1:\mu_1\neq\mu_2\)


We need a test statistic, \(t^*\):

\[t^*=\frac{(\bar{x}_1-\bar{x}_2)-(\mu_1-\mu_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]

Under \(H_0\), \(\mu_1=\mu_2\), so \(\mu_1-\mu_2=0\)

\[t=\frac{(\bar{x}_1-\bar{x}_2)-0}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]

  • \(\mu_1\), \(\mu_2\) are population means (under the assumption that \(H_0\) is true)

  • \(\bar{x}_1\), \(\bar{x}_2\) are sample means

  • \(s_1\), \(s_2\) are sample standard deviations

  • \(n_1\), \(n_2\) are sample sizes


  • The test statistic measures how large the sample mean difference \((\bar{x}_1-\bar{x}_2)\) differs from the hypothesized value \(\mu_1-\mu_2\) in \(H_0\)

  • The test statistic comes from a Student’s \(t\) distribution with degrees of freedom:

    \[df=\min(n_1-1,n_2-1)\]

    • (i.e., the smaller of \(n_1-1\) and \(n_2-1\)).


For the P-value calculation:

  • Left-tailed: \(H_1:\mu_1<\mu_2\)

    \[\text{P-value}=P(T<t)\]

  • Right-tailed: \(H_1:\mu_1>\mu_2\)

    \[\text{P-value}=P(T>t)\]

  • Two-tailed: \(H_1:\mu_1\neq\mu_2\)

    \[\text{P-value}=2\cdot P(T<-|t|) \text{ OR } 2\cdot P(T>|t|)\]


The steps for this hypothesis test are:

  1. State the null and alternate hypotheses

  2. Choose a significance level \(\alpha\)

  3. Compute the test statistic:

    \[t=\frac{(\bar{x}_1-\bar{x}_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]

  4. Compute the P-value of the test statistic \(t\)

  • Left-tailed: \(\text{P-value}=P(T<t)\)

  • Right-tailed: \(\text{P-value}=P(T>t)\)

  • Two-tailed: \(\text{P-value}=2\cdot P(T<-|t|)\) or \(2\cdot P(T>|t|)\)

Note: The degrees of freedom of the \(t\) distribution is: \[df=\min(n_1-1,n_2-1)\]

  1. Determine whether to reject \(H_0\):
    • Reject \(H_0\) if \(\text{P-value} \leq \alpha\).
  2. State a conclusion


Example 1

The National Assessment Educational Progress tested a sample of students who had used a computer in their mathematics classes, and another sample of students who had not used a computer. The sample mean score for students using a computer was 309, with a sample standard deviation of 29. For students not using a computer, the sample mean was 303, with a sample standard deviation of 32. Assume there were 60 students in the computer sample and 40 students in the sample that hadn’t used a computer.

At 5% significance level, conduct a hypothesis test to determine whether the population mean scores differ in the between those students who use a computer and those who do not.


Step 1. State the null and alternative hypotheses

\[H_0: \mu_1 = \mu_2\]

\[H_A: \mu_1 \neq \mu_2 \quad (\rightarrow \text{two-tailed})\]


Step 2. The significance level is \(\alpha=0.05\)


Step 3. Compute the test statistic

\[ \begin{array}{|c|c|c|c|} \hline \text{} & \text{Sample Mean} & \text{Sample Std. Dev.} & \text{Sample Size} \\ \hline \text{With Computer} & \bar{x}_1 = 309 & s_1 = 29 & n_1 = 60 \\ \hline \text{Without Computer} & \bar{x}_2 = 303 & s_2 = 32 & n_2 = 40 \\ \hline \end{array} \]

\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

\[t = \frac{(309 - 303) - (0)}{\sqrt{\frac{29^2}{60} + \frac{32^2}{40}}} \approx 0.953\]


Step 4. Compute the P-value

We use the t-table with \(df=\min(n_1-1, n_2-1)=39\). Then:

\[P(T>0.953) \text{ is between } P(T>1.304)=0.10 \text{ and } P(T>0.681)=0.25\]

For the two-tailed test, the P-value:

\[\text{P-value} = 2 \cdot P(T>0.953) \text{ is between } 0.20 \text{ and } 0.50\]


Step 5. Determine whether to reject \(H_0\)

Since the P-value \(> \alpha = 0.05\), we fail to reject \(H_0\)


Step 6. State a conclusion

There is not enough evidence to conclude that the mean scores differ between those students who use a computer and those who do not (i.e., the mean scores may be the same)



Hypothesis Tests for Difference Between Two Means (Paired)

  • Next we turn our attention to a hypothesis test for paired (or matched) samples


  • Example: Gas mileage before and after tune-up for automobiles

\[ \begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \text{Automobile} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \text{After Tune-up} & 35.44 & 35.17 & 31.07 & 31.57 & 26.48 & 23.11 & 25.18 & 32.39 \\ \hline \text{Before Tune-up} & 33.76 & 34.30 & 29.55 & 30.90 & 24.92 & 21.78 & 24.30 & 31.25 \\ \hline \end{array} \]

  • Both mileages before and after tune-up are obtained from the same automobile (i.e., the values are paired within the subject)

-Now, we are interested in testing the population mean difference for the matched pairs

-Our hypothesis test will involve two paired random samples from a single population

-The set of differences between the values in the matched pairs is considered as the sample data

-Required assumption: Each sample size is large (n > 30), or the differences in the matched pairs are normally distributed (at least approximately)

  • The population mean difference for the matched pairs is denoted \(\mu_d\) (unknown value)

  • The sample mean of the differences is denoted \(\bar{d}\)

  • The sample std. deviation of the differences is denoted \(s_d\)

\[\mu_d = \text{the mean mileage difference before and after tune-up}\]

\[\bar{d} = \frac{1.68 + 0.87 + \ldots + 1.14}{8} \approx 1.2063\]

\[s_d = \sqrt{\frac{(1.68 - 1.206)^2 + \ldots + (1.14 - 1.206)^2}{7}} \approx 0.3732\]

Step 1. State the null and alternate hypotheses. The null hypothesis is of the form

\[H_0: \mu_d = \mu_0\]

where \(\mu_0\) is a prespecified value (e.g. \(\mu_0 = 0\) is most common)


The alternate hypothesis:

  • Left-tailed: \(H_1: \mu_d < \mu_0\)

  • Right-tailed: \(H_1: \mu_d > \mu_0\)

  • Two-tailed: \(H_1: \mu_d \neq \mu_0\)

Step 2. Choose a significance level \(\alpha\)

Step 3. Compute the test statistic:

\[t = \frac{\bar{d} - \mu_0}{s_d / \sqrt{n}}\]

which follows a Student’s \(t\) distribution with \(df = n - 1\)

Step 4. Compute the P-value of the test statistic \(t\)

  • Left-tailed: P-value = area under the Student’s \(t\) distribution to the left of \(t\), i.e., \(P(T < t)\)

  • Right-tailed: P-value = area under the Student’s \(t\) distribution to the right of \(t\), i.e., \(P(T > t)\)

  • Two-tailed: P-value = sum of the areas under the Student’s \(t\) distribution to the left of \(-|t|\) and right of \(|t|\), i.e., \(2 \cdot P(T < -|t|)\) or \(2 \cdot P(T > |t|)\)

Step 5. Determine whether to reject \(H_0\):

  • Reject \(H_0\) if P-value \(\leq \alpha\)

Step 6. State a conclusion



Go away