18 Day 17

Review

Sampling Distribution

Given any population YN(μ,σ2)

  • Sample XN(μX,σ2X)

  • Sample mean ˉxN(μˉx,σ2ˉx)

    • Where: μˉx=μ

    • And: σ2ˉx=σˉx=σn


Central Limit Theorem

  • Let ˉx be the mean of a large random sample (n>30) from any population

    • With mean μ and standard deviation σ
  • The distribution of ˉx is approximately normal

    • Mean μˉx=μ

    • Standard deviation σˉx=σn.


If n is large enough, we have:

ˉxN(μ,σ2n)

  • Regardless of the original population’s distribution


How large does n need to be?


Population Proportion

  • Proportions are just percentages of the population


  • Say the percentage of the population who participate in early voting is 40%

    • 40100=0.40

    • The proportion of the population who early vote, p=0.40

  • If we poll a sample of 100 Manhattan residents and find that 31% early vote:

    • The proportion of our sample who early vote, ˆp=0.31


Just like every other statistic, sample proportions are random variables

  • So their distribution is the sampling distribution of the proportion


All of our previous rules and ideas apply

  • As we take samples from our population we will see they aren’t consistent

  • The more we sample the closer we get to true values


  • Mean of the sample proportion ˆp is:

μˆp=p(population proportion)

  • Standard deviation of sample proportion ˆp is:

σˆp=p(1p)n


Proportion Central Limit Theorem

  • If np10 and n(1p)10

  • Distribution of ˆp is approximately normal

    • Mean μˆp=p

    • Standard deviation σˆp=p(1p)n

So:

ˆpN(p,p(1p)n)


  • Point estimates are a deterministic result

    • Statistics deals with probabilistic results


Confidence Intervals

  • Since: the value of ˉx varies with each sample

    • We need to quantify the uncertainty associated with ˉx


Example:

A random sample of 120 students admitted to top business schools yielded an average GPA of 3.45

ˉx=3.45 This is a point estimate of μ

  • One number, no additional information provided



A confidence interval (CI) provides a range of values that contains:

  • The population parameter

  • With a certain level of confidence

    • We refer to this as the confidence level


Formula for the CI:

Point estimate±Margin of Error

  • The confidence interval for μ:

ˉx±Margin of Error


(ˉxMargin of Error,ˉx+Margin of Error)


Margin of error

The farthest distance we believe our estimate ˉx is from μ

  • The size of the margin of error is determined by the sampling distribution of ˉx and the confidence level


  • Confidence level is denoted by 100(1α)%

    • Typically 90%, 95%, or 99%


For a population with unknown μ but known σ, a 100(1α)% confidence interval for μ is computed as:

ˉx±zα/2σn

Where zα/2 is the z-score with an area of α/2 to its right


When construction a confidence interval for μ

  • We have to consider our assumptions


At least one of the following must hold:

  1. The sample size is large (n>30)

  2. The original population is normally distributed


In most practical cases, σ is unknown, and we must use the sample standard deviation s

The formula for the confidence interval is:

ˉx±tα/2sn

Where tα/2 is the critical value from the Student’s t-distribution, and s is the sample standard deviation


Student’s t-Distribution

  • The (Student’s) t-distribution is similar to the standard normal distribution

    • Unimodal

    • Symmetric around 0

  • But it has wider (or heavier) tails than the standard normal

    • Meaning it’s more spread out
  • The t-distribution is distinguished by degrees of freedom (df=n1)

    • As df increase the t-distribution converges to a normal distribution


The critical value tα/2 is a t value separating an area of α/2 in the right tail of the t distribution

When using the t distribution ot construct a confidence interval for μ:

  • Degrees of freedom (df) is 1 less than the sample size


Example 1: Finding Critical Value

Find the critical value tα/2 for a 95% confidence interval with n=8

  • Set 1α=0.95, then α=0.05, and α/2=0.025

  • For n=8df=n1=7

The critical value is tα/2=2.365


What is the df I’m looking for isn’t in the table?

  • Round down to the nearest value on the table

    • If df=59, round down to df=50

    • At 95% confidence, tα/2=2.009


Summary of CI for Population Mean μ:

Check your assumptions for construction a CI of μ:

  • Sample size is large (n>30) or the population is normal


100(1α)% confidence interval is computed as:

  • Case 1: σ is known, use the z-method:

    ˉx±zα/2σn

  • Case 2: σ is unknown, use the t-method:

    ˉx±tα/2sn


Example 2: Constructing a CI

Given a sample of size n=5 from a normal population, ˉx=4.31, and s=2.7, construct a 95% confidence interval for μ


  1. Should we use z method or t method?
  • σ is unknown


  1. Compute the margin of error for this 95% confidence interval:
  • With df=4 and tα/2=2.776, calculate:

Margin of Error=2.776×2.753.352


  1. Construct a 95% confidence interval for μ and interpret your result:

4.31±3.352or(0.958,7.662)

We are 95% confident that the true population mean lies between 0.958 and 7.662


  1. If the population were not normal, would the confidence interval in (c) be valid?


Interpreting a CI

  • Suppose we take many random samples and construct a 95% confidence interval from each sample

    • 95% of those intervals would contain the true population mean, μ


In practice:

  • We say that we’re 95% confident that the true value of μ is within our confidence interval