18 Day 17
Review
Sampling Distribution
Given any population \(Y \sim N(\mu,\sigma^2)\)
Sample \(X \sim N(\mu_X,\sigma^2_X)\)
Sample mean \(\bar{x} \sim N(\mu_{\bar{x}},\sigma^2_{\bar{x}})\)
Where: \(\mu_{\bar{x}} = \mu\)
And: \({\sqrt{\sigma^2_{\bar{x}}}}=\sigma_{\bar{x}} = {\sigma \over \sqrt{n}}\)
Central Limit Theorem
Let \(\bar{x}\) be the mean of a large random sample (\(n > 30\)) from any population
- With mean \(\mu\) and standard deviation \(\sigma\)
The distribution of \(\bar{x}\) is approximately normal
Mean \(\mu_{\bar{x}} = \mu\)
Standard deviation \(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\).
If \(n\) is large enough, we have:
\[\bar{x} \sim N(\mu, \frac{\sigma^2}{n})\]
- Regardless of the original population’s distribution
How large does \(n\) need to be?
Population Proportion
- Proportions are just percentages of the population
Say the percentage of the population who participate in early voting is \(40\%\)
\({40\over 100}=0.40\)
The proportion of the population who early vote, \(p=0.40\)
If we poll a sample of 100 Manhattan residents and find that \(31\%\) early vote:
- The proportion of our sample who early vote, \(\hat{p}=0.31\)
Just like every other statistic, sample proportions are random variables
- So their distribution is the sampling distribution of the proportion
All of our previous rules and ideas apply
As we take samples from our population we will see they aren’t consistent
The more we sample the closer we get to true values
- Mean of the sample proportion \(\hat{p}\) is:
\[\mu_{\hat{p}} = p \quad \text{(population proportion)}\]
- Standard deviation of sample proportion \(\hat{p}\) is:
\[\sigma_{\hat{p}} = \sqrt{\frac{p(1 - p)}{n}}\]
Proportion Central Limit Theorem
If \(np \geq 10\) and \(n(1 - p) \geq 10\)
Distribution of \(\hat{p}\) is approximately normal
Mean \(\mu_{\hat{p}} = p\)
Standard deviation \(\sigma_{\hat{p}} = \sqrt{\frac{p(1 - p)}{n}}\)
So:
\[\hat{p} \sim N \left( p, \frac{p(1 - p)}{n} \right)\]
Point estimates are a deterministic result
- Statistics deals with probabilistic results
Confidence Intervals
Since: the value of \(\bar{x}\) varies with each sample
- We need to quantify the uncertainty associated with \(\bar{x}\)
Example:
A random sample of \(120\) students admitted to top business schools yielded an average GPA of \(3.45\)
\[\bar{x} = 3.45\] This is a point estimate of \(\mu\)
- One number, no additional information provided
A confidence interval (CI) provides a range of values that contains:
The population parameter
With a certain level of confidence
- We refer to this as the confidence level
Formula for the CI:
\[\text{Point estimate} \pm \text{Margin of Error}\]
- The confidence interval for \(\mu\):
\[ \bar{x} \pm \text{Margin of Error} \]
\[ (\bar{x} - \text{Margin of Error}, \bar{x} + \text{Margin of Error}) \]
Margin of error
The farthest distance we believe our estimate \(\bar{x}\) is from \(\mu\)
- The size of the margin of error is determined by the sampling distribution of \(\bar{x}\) and the confidence level
Confidence level is denoted by \(100(1 - \alpha)\%\)
- Typically \(90\%\), \(95\%\), or \(99\%\)
For a population with unknown \(\mu\) but known \(\sigma\), a \(100(1 - \alpha)\%\) confidence interval for \(\mu\) is computed as:
\[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]
Where \(z_{\alpha/2}\) is the z-score with an area of \(\alpha/2\) to its right
When construction a confidence interval for \(\mu\)
- We have to consider our assumptions
At least one of the following must hold:
The sample size is large (\(n > 30\))
The original population is normally distributed
In most practical cases, \(\sigma\) is unknown, and we must use the sample standard deviation \(s\)
The formula for the confidence interval is:
\[\bar{x} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}}\]
Where \(t_{\alpha/2}\) is the critical value from the Student’s t-distribution, and \(s\) is the sample standard deviation
Student’s t-Distribution
The (Student’s) t-distribution is similar to the standard normal distribution
Unimodal
Symmetric around \(0\)
But it has wider (or heavier) tails than the standard normal
- Meaning it’s more spread out
The t-distribution is distinguished by degrees of freedom (\(df = n - 1\))
- As \(df\) increase the t-distribution converges to a normal distribution
The critical value \(t_{\alpha/2}\) is a \(t\) value separating an area of \(\alpha/2\) in the right tail of the \(t\) distribution
When using the \(t\) distribution ot construct a confidence interval for \(\mu\):
- Degrees of freedom (\(df\)) is \(1\) less than the sample size
Example 1: Finding Critical Value
Find the critical value \(t_{\alpha/2}\) for a 95% confidence interval with \(n = 8\)
Set \(1 - \alpha = 0.95\), then \(\alpha = 0.05\), and \(\alpha/2 = 0.025\)
For \(n = 8 \Rightarrow df = n-1 = 7\)
The critical value is \(t_{\alpha/2} = 2.365\)
What is the \(df\) I’m looking for isn’t in the table?
Round down to the nearest value on the table
If \(df=59\), round down to \(df=50\)
At \(95\%\) confidence, \(t_{\alpha/2}=2.009\)
Summary of CI for Population Mean \(\mu\):
Check your assumptions for construction a CI of \(\mu\):
- Sample size is large (\(n>30\)) or the population is normal
\(100(1-\alpha)\%\) confidence interval is computed as:
Case 1: \(\sigma\) is known, use the z-method:
\[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]
Case 2: \(\sigma\) is unknown, use the t-method:
\[\bar{x} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}}\]
Example 2: Constructing a CI
Given a sample of size \(n = 5\) from a normal population, \(\bar{x} = 4.31\), and \(s = 2.7\), construct a 95% confidence interval for \(\mu\)
- Should we use \(z\) method or \(t\) method?
- \(\sigma\) is unknown
- Compute the margin of error for this \(95\%\) confidence interval:
- With \(df = 4\) and \(t_{\alpha/2} = 2.776\), calculate:
\[\text{Margin of Error} = 2.776 \times \frac{2.7}{\sqrt{5}} \approx 3.352\]
- Construct a \(95\%\) confidence interval for \(\mu\) and interpret your result:
\[4.31 \pm 3.352 \quad \text{or} \quad (0.958, 7.662)\]
We are 95% confident that the true population mean lies between 0.958 and 7.662
- If the population were not normal, would the confidence interval in (c) be valid?
Interpreting a CI
Suppose we take many random samples and construct a \(95\%\) confidence interval from each sample
- \(95\%\) of those intervals would contain the true population mean, \(\mu\)
In practice:
- We say that we’re \(95\%\) confident that the true value of \(\mu\) is within our confidence interval