18 Day 17
Review
Sampling Distribution
Given any population Y∼N(μ,σ2)
Sample X∼N(μX,σ2X)
Sample mean ˉx∼N(μˉx,σ2ˉx)
Where: μˉx=μ
And: √σ2ˉx=σˉx=σ√n
Central Limit Theorem
Let ˉx be the mean of a large random sample (n>30) from any population
- With mean μ and standard deviation σ
The distribution of ˉx is approximately normal
Mean μˉx=μ
Standard deviation σˉx=σ√n.
If n is large enough, we have:
ˉx∼N(μ,σ2n)
- Regardless of the original population’s distribution
How large does n need to be?
Population Proportion
- Proportions are just percentages of the population
Say the percentage of the population who participate in early voting is 40%
40100=0.40
The proportion of the population who early vote, p=0.40
If we poll a sample of 100 Manhattan residents and find that 31% early vote:
- The proportion of our sample who early vote, ˆp=0.31
Just like every other statistic, sample proportions are random variables
- So their distribution is the sampling distribution of the proportion
All of our previous rules and ideas apply
As we take samples from our population we will see they aren’t consistent
The more we sample the closer we get to true values
- Mean of the sample proportion ˆp is:
μˆp=p(population proportion)
- Standard deviation of sample proportion ˆp is:
σˆp=√p(1−p)n
Proportion Central Limit Theorem
If np≥10 and n(1−p)≥10
Distribution of ˆp is approximately normal
Mean μˆp=p
Standard deviation σˆp=√p(1−p)n
So:
ˆp∼N(p,p(1−p)n)
Point estimates are a deterministic result
- Statistics deals with probabilistic results
Confidence Intervals
Since: the value of ˉx varies with each sample
- We need to quantify the uncertainty associated with ˉx
Example:
A random sample of 120 students admitted to top business schools yielded an average GPA of 3.45
ˉx=3.45 This is a point estimate of μ
- One number, no additional information provided
A confidence interval (CI) provides a range of values that contains:
The population parameter
With a certain level of confidence
- We refer to this as the confidence level
Formula for the CI:
Point estimate±Margin of Error
- The confidence interval for μ:
ˉx±Margin of Error
(ˉx−Margin of Error,ˉx+Margin of Error)
Margin of error
The farthest distance we believe our estimate ˉx is from μ
- The size of the margin of error is determined by the sampling distribution of ˉx and the confidence level
Confidence level is denoted by 100(1−α)%
- Typically 90%, 95%, or 99%
For a population with unknown μ but known σ, a 100(1−α)% confidence interval for μ is computed as:
ˉx±zα/2⋅σ√n
Where zα/2 is the z-score with an area of α/2 to its right
When construction a confidence interval for μ
- We have to consider our assumptions
At least one of the following must hold:
The sample size is large (n>30)
The original population is normally distributed
In most practical cases, σ is unknown, and we must use the sample standard deviation s
The formula for the confidence interval is:
ˉx±tα/2⋅s√n
Where tα/2 is the critical value from the Student’s t-distribution, and s is the sample standard deviation
Student’s t-Distribution
The (Student’s) t-distribution is similar to the standard normal distribution
Unimodal
Symmetric around 0
But it has wider (or heavier) tails than the standard normal
- Meaning it’s more spread out
The t-distribution is distinguished by degrees of freedom (df=n−1)
- As df increase the t-distribution converges to a normal distribution
The critical value tα/2 is a t value separating an area of α/2 in the right tail of the t distribution
When using the t distribution ot construct a confidence interval for μ:
- Degrees of freedom (df) is 1 less than the sample size
Example 1: Finding Critical Value
Find the critical value tα/2 for a 95% confidence interval with n=8
Set 1−α=0.95, then α=0.05, and α/2=0.025
For n=8⇒df=n−1=7
The critical value is tα/2=2.365
What is the df I’m looking for isn’t in the table?
Round down to the nearest value on the table
If df=59, round down to df=50
At 95% confidence, tα/2=2.009
Summary of CI for Population Mean μ:
Check your assumptions for construction a CI of μ:
- Sample size is large (n>30) or the population is normal
100(1−α)% confidence interval is computed as:
Case 1: σ is known, use the z-method:
ˉx±zα/2⋅σ√n
Case 2: σ is unknown, use the t-method:
ˉx±tα/2⋅s√n
Example 2: Constructing a CI
Given a sample of size n=5 from a normal population, ˉx=4.31, and s=2.7, construct a 95% confidence interval for μ
- Should we use z method or t method?
- σ is unknown
- Compute the margin of error for this 95% confidence interval:
- With df=4 and tα/2=2.776, calculate:
Margin of Error=2.776×2.7√5≈3.352
- Construct a 95% confidence interval for μ and interpret your result:
4.31±3.352or(0.958,7.662)
We are 95% confident that the true population mean lies between 0.958 and 7.662
- If the population were not normal, would the confidence interval in (c) be valid?