22 Day 22

Announcements

That’s all folks

  • This is the last new content we’ll cover

  • From here on, it’s review, review, review

  • Wednesday we’ll do a refresh of some of the stuff we probably completely forgot

  • Thanksgiving break is next week

Some of you will get the TEVAL soon (we administer them out semi-randomly)


Review

Null Hypothesis H0

The statement we are holding as known and established information

  • i.e., The average body weight of an adult cat is 10 lbs.

H0:μ=10


Alternate Hypothesis Ha or H1

The statement we are testing to determine the accuracy of

  • I believe that the cats I interact with regularly have a different average body weight than the population

Ha:μ10


Test Statistic t

A value calculated as part of the hypothesis testing process. We place it into a t-table (or z-table depending) to get a p-value.

t=ˉxμ0s/n

  • I weighed 4 of my friends cats and my own cat and found that their average body weight was 8 pounds, with a standard deviation of 2.49

t=8102.49/5

t=1.796039


A reminder of our key study participant:


Significance level α

The percentage probability we incur Type 1 Error in our hypothesis testing process

  • I want to test my cat weight hypothesis at α=0.05


P-value

The final statistic calculated in a hypothesis test, used to determine if we reject or fail to reject the null hypothesis

2P(T>t)=0.15

0.15>αFail to Reject H0


Statistically Significant

We refer to a result as statistically significant if we tested it against a null hypothesis and proceeded to reject the null hypothesis

  • There is insufficient evidence to suggest that the body weight of the cats that interact with regularly have a statistically significant difference in average body weight from the population



Hypothesis Tests for Difference Between Two Means (Independent)

  • We’ve covered hypothesis testing for a single population parameter

    • (e.g., population mean μ)


  • Let’s look at testing a claim about the difference between two population means

μ1μ2

  • We need two independent random samples from two distinct populations

    • Independence implies that X and Y have no effect on one another


  • As with everything we do in this class, we need to confirm our sample can be assumed as approximately normal

n>30


  • We want to see if population means μ1 and μ2 are equal:

H0:μ1=μ2

  • There are three possibile alternate hypotheses:

    • Left-tailed: H1:μ1<μ2

    • Right-tailed: H1:μ1>μ2

    • Two-tailed: H1:μ1μ2


We need a test statistic, t:

t=(ˉx1ˉx2)(μ1μ2)(s21/n1)+(s22/n2)

Under H0, μ1=μ2, so μ1μ2=0

t=(ˉx1ˉx2)0(s21/n1)+(s22/n2)

  • μ1, μ2 are population means (under the assumption that H0 is true)

  • ˉx1, ˉx2 are sample means

  • s1, s2 are sample standard deviations

  • n1, n2 are sample sizes


  • The test statistic measures how large the sample mean difference (ˉx1ˉx2) differs from the hypothesized value μ1μ2 in H0

  • The test statistic comes from a Student’s t distribution with degrees of freedom:

    df=min

    • (i.e., the smaller of n_1-1 and n_2-1).


For the P-value calculation:

  • Left-tailed: H_1:\mu_1<\mu_2

    \text{P-value}=P(T<t)

  • Right-tailed: H_1:\mu_1>\mu_2

    \text{P-value}=P(T>t)

  • Two-tailed: H_1:\mu_1\neq\mu_2

    \text{P-value}=2\cdot P(T<-|t|) \text{ OR } 2\cdot P(T>|t|)


The steps for this hypothesis test are:

  1. State the null and alternate hypotheses

  2. Choose a significance level \alpha

  3. Compute the test statistic:

    t=\frac{(\bar{x}_1-\bar{x}_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}

  4. Compute the P-value of the test statistic t

  • Left-tailed: \text{P-value}=P(T<t)

  • Right-tailed: \text{P-value}=P(T>t)

  • Two-tailed: \text{P-value}=2\cdot P(T<-|t|) or 2\cdot P(T>|t|)

Note: The degrees of freedom of the t distribution is: df=\min(n_1-1,n_2-1)

  1. Determine whether to reject H_0:
    • Reject H_0 if \text{P-value} \leq \alpha.
  2. State a conclusion


Example 1

The National Assessment Educational Progress tested a sample of students who had used a computer in their mathematics classes, and another sample of students who had not used a computer. The sample mean score for students using a computer was 309, with a sample standard deviation of 29. For students not using a computer, the sample mean was 303, with a sample standard deviation of 32. Assume there were 60 students in the computer sample and 40 students in the sample that hadn’t used a computer.

At 5% significance level, conduct a hypothesis test to determine whether the population mean scores differ in the between those students who use a computer and those who do not.


Step 1. State the null and alternative hypotheses

H_0: \mu_1 = \mu_2

H_A: \mu_1 \neq \mu_2 \quad (\rightarrow \text{two-tailed})


Step 2. The significance level is \alpha=0.05


Step 3. Compute the test statistic

\begin{array}{|c|c|c|c|} \hline \text{} & \text{Sample Mean} & \text{Sample Std. Dev.} & \text{Sample Size} \\ \hline \text{With Computer} & \bar{x}_1 = 309 & s_1 = 29 & n_1 = 60 \\ \hline \text{Without Computer} & \bar{x}_2 = 303 & s_2 = 32 & n_2 = 40 \\ \hline \end{array}

t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

t = \frac{(309 - 303) - (0)}{\sqrt{\frac{29^2}{60} + \frac{32^2}{40}}} \approx 0.953


Step 4. Compute the P-value

We use the t-table with df=\min(n_1-1, n_2-1)=39. Then:

P(T>0.953) \text{ is between } P(T>1.304)=0.10 \text{ and } P(T>0.681)=0.25

For the two-tailed test, the P-value:

\text{P-value} = 2 \cdot P(T>0.953) \text{ is between } 0.20 \text{ and } 0.50


Step 5. Determine whether to reject H_0

Since the P-value > \alpha = 0.05, we fail to reject H_0


Step 6. State a conclusion

There is not enough evidence to conclude that the mean scores differ between those students who use a computer and those who do not (i.e., the mean scores may be the same)



Hypothesis Tests for Difference Between Two Means (Paired)

  • Next we turn our attention to a hypothesis test for paired (or matched) samples


  • Example: Gas mileage before and after tune-up for automobiles

\begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \text{Automobile} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \text{After Tune-up} & 35.44 & 35.17 & 31.07 & 31.57 & 26.48 & 23.11 & 25.18 & 32.39 \\ \hline \text{Before Tune-up} & 33.76 & 34.30 & 29.55 & 30.90 & 24.92 & 21.78 & 24.30 & 31.25 \\ \hline \end{array}

  • Both mileages before and after tune-up are obtained from the same automobile (i.e., the values are paired within the subject)

-Now, we are interested in testing the population mean difference for the matched pairs

-Our hypothesis test will involve two paired random samples from a single population

-The set of differences between the values in the matched pairs is considered as the sample data

-Required assumption: Each sample size is large (n > 30), or the differences in the matched pairs are normally distributed (at least approximately)

  • The population mean difference for the matched pairs is denoted \mu_d (unknown value)

  • The sample mean of the differences is denoted \bar{d}

  • The sample std. deviation of the differences is denoted s_d

\mu_d = \text{the mean mileage difference before and after tune-up}

\bar{d} = \frac{1.68 + 0.87 + \ldots + 1.14}{8} \approx 1.2063

s_d = \sqrt{\frac{(1.68 - 1.206)^2 + \ldots + (1.14 - 1.206)^2}{7}} \approx 0.3732

Step 1. State the null and alternate hypotheses. The null hypothesis is of the form

H_0: \mu_d = \mu_0

where \mu_0 is a prespecified value (e.g. \mu_0 = 0 is most common)


The alternate hypothesis:

  • Left-tailed: H_1: \mu_d < \mu_0

  • Right-tailed: H_1: \mu_d > \mu_0

  • Two-tailed: H_1: \mu_d \neq \mu_0

Step 2. Choose a significance level \alpha

Step 3. Compute the test statistic:

t = \frac{\bar{d} - \mu_0}{s_d / \sqrt{n}}

which follows a Student’s t distribution with df = n - 1

Step 4. Compute the P-value of the test statistic t

  • Left-tailed: P-value = area under the Student’s t distribution to the left of t, i.e., P(T < t)

  • Right-tailed: P-value = area under the Student’s t distribution to the right of t, i.e., P(T > t)

  • Two-tailed: P-value = sum of the areas under the Student’s t distribution to the left of -|t| and right of |t|, i.e., 2 \cdot P(T < -|t|) or 2 \cdot P(T > |t|)

Step 5. Determine whether to reject H_0:

  • Reject H_0 if P-value \leq \alpha

Step 6. State a conclusion



Go away