22 Day 22

Announcements

That’s all folks

  • This is the last new content we’ll cover

  • From here on, it’s review, review, review

  • Wednesday we’ll do a refresh of some of the stuff we probably completely forgot

  • Thanksgiving break is next week

Some of you will get the TEVAL soon (we administer them out semi-randomly)


Review

Null Hypothesis H0

The statement we are holding as known and established information

  • i.e., The average body weight of an adult cat is 10 lbs.

H0:μ=10


Alternate Hypothesis Ha or H1

The statement we are testing to determine the accuracy of

  • I believe that the cats I interact with regularly have a different average body weight than the population

Ha:μ10


Test Statistic t

A value calculated as part of the hypothesis testing process. We place it into a t-table (or z-table depending) to get a p-value.

t=ˉxμ0s/n

  • I weighed 4 of my friends cats and my own cat and found that their average body weight was 8 pounds, with a standard deviation of 2.49

t=8102.49/5

t=1.796039


A reminder of our key study participant:


Significance level α

The percentage probability we incur Type 1 Error in our hypothesis testing process

  • I want to test my cat weight hypothesis at α=0.05


P-value

The final statistic calculated in a hypothesis test, used to determine if we reject or fail to reject the null hypothesis

2P(T>t)=0.15

0.15>αFail to Reject H0


Statistically Significant

We refer to a result as statistically significant if we tested it against a null hypothesis and proceeded to reject the null hypothesis

  • There is insufficient evidence to suggest that the body weight of the cats that interact with regularly have a statistically significant difference in average body weight from the population



Hypothesis Tests for Difference Between Two Means (Independent)

  • We’ve covered hypothesis testing for a single population parameter

    • (e.g., population mean μ)


  • Let’s look at testing a claim about the difference between two population means

μ1μ2

  • We need two independent random samples from two distinct populations

    • Independence implies that X and Y have no effect on one another


  • As with everything we do in this class, we need to confirm our sample can be assumed as approximately normal

n>30


  • We want to see if population means μ1 and μ2 are equal:

H0:μ1=μ2

  • There are three possibile alternate hypotheses:

    • Left-tailed: H1:μ1<μ2

    • Right-tailed: H1:μ1>μ2

    • Two-tailed: H1:μ1μ2


We need a test statistic, t:

t=(ˉx1ˉx2)(μ1μ2)(s21/n1)+(s22/n2)

Under H0, μ1=μ2, so μ1μ2=0

t=(ˉx1ˉx2)0(s21/n1)+(s22/n2)

  • μ1, μ2 are population means (under the assumption that H0 is true)

  • ˉx1, ˉx2 are sample means

  • s1, s2 are sample standard deviations

  • n1, n2 are sample sizes


  • The test statistic measures how large the sample mean difference (ˉx1ˉx2) differs from the hypothesized value μ1μ2 in H0

  • The test statistic comes from a Student’s t distribution with degrees of freedom:

    df=min(n11,n21)

    • (i.e., the smaller of n11 and n21).


For the P-value calculation:

  • Left-tailed: H1:μ1<μ2

    P-value=P(T<t)

  • Right-tailed: H1:μ1>μ2

    P-value=P(T>t)

  • Two-tailed: H1:μ1μ2

    P-value=2P(T<|t|) OR 2P(T>|t|)


The steps for this hypothesis test are:

  1. State the null and alternate hypotheses

  2. Choose a significance level α

  3. Compute the test statistic:

    t=(ˉx1ˉx2)(s21/n1)+(s22/n2)

  4. Compute the P-value of the test statistic t

  • Left-tailed: P-value=P(T<t)

  • Right-tailed: P-value=P(T>t)

  • Two-tailed: P-value=2P(T<|t|) or 2P(T>|t|)

Note: The degrees of freedom of the t distribution is: df=min(n11,n21)

  1. Determine whether to reject H0:
    • Reject H0 if P-valueα.
  2. State a conclusion


Example 1

The National Assessment Educational Progress tested a sample of students who had used a computer in their mathematics classes, and another sample of students who had not used a computer. The sample mean score for students using a computer was 309, with a sample standard deviation of 29. For students not using a computer, the sample mean was 303, with a sample standard deviation of 32. Assume there were 60 students in the computer sample and 40 students in the sample that hadn’t used a computer.

At 5% significance level, conduct a hypothesis test to determine whether the population mean scores differ in the between those students who use a computer and those who do not.


Step 1. State the null and alternative hypotheses

H0:μ1=μ2

HA:μ1μ2(two-tailed)


Step 2. The significance level is α=0.05


Step 3. Compute the test statistic

Sample MeanSample Std. Dev.Sample SizeWith Computerˉx1=309s1=29n1=60Without Computerˉx2=303s2=32n2=40

t=(ˉx1ˉx2)(μ1μ2)s21n1+s22n2

t=(309303)(0)29260+322400.953


Step 4. Compute the P-value

We use the t-table with df=min(n11,n21)=39. Then:

P(T>0.953) is between P(T>1.304)=0.10 and P(T>0.681)=0.25

For the two-tailed test, the P-value:

P-value=2P(T>0.953) is between 0.20 and 0.50


Step 5. Determine whether to reject H0

Since the P-value >α=0.05, we fail to reject H0


Step 6. State a conclusion

There is not enough evidence to conclude that the mean scores differ between those students who use a computer and those who do not (i.e., the mean scores may be the same)



Hypothesis Tests for Difference Between Two Means (Paired)

  • Next we turn our attention to a hypothesis test for paired (or matched) samples


  • Example: Gas mileage before and after tune-up for automobiles

Automobile12345678After Tune-up35.4435.1731.0731.5726.4823.1125.1832.39Before Tune-up33.7634.3029.5530.9024.9221.7824.3031.25

  • Both mileages before and after tune-up are obtained from the same automobile (i.e., the values are paired within the subject)

-Now, we are interested in testing the population mean difference for the matched pairs

-Our hypothesis test will involve two paired random samples from a single population

-The set of differences between the values in the matched pairs is considered as the sample data

-Required assumption: Each sample size is large (n > 30), or the differences in the matched pairs are normally distributed (at least approximately)

  • The population mean difference for the matched pairs is denoted μd (unknown value)

  • The sample mean of the differences is denoted ˉd

  • The sample std. deviation of the differences is denoted sd

μd=the mean mileage difference before and after tune-up

ˉd=1.68+0.87++1.1481.2063

sd=(1.681.206)2++(1.141.206)270.3732

Step 1. State the null and alternate hypotheses. The null hypothesis is of the form

H0:μd=μ0

where μ0 is a prespecified value (e.g. μ0=0 is most common)


The alternate hypothesis:

  • Left-tailed: H1:μd<μ0

  • Right-tailed: H1:μd>μ0

  • Two-tailed: H1:μdμ0

Step 2. Choose a significance level α

Step 3. Compute the test statistic:

t=ˉdμ0sd/n

which follows a Student’s t distribution with df=n1

Step 4. Compute the P-value of the test statistic t

  • Left-tailed: P-value = area under the Student’s t distribution to the left of t, i.e., P(T<t)

  • Right-tailed: P-value = area under the Student’s t distribution to the right of t, i.e., P(T>t)

  • Two-tailed: P-value = sum of the areas under the Student’s t distribution to the left of |t| and right of |t|, i.e., 2P(T<|t|) or 2P(T>|t|)

Step 5. Determine whether to reject H0:

  • Reject H0 if P-value α

Step 6. State a conclusion



Go away