17 Day 16
Review
All random variables have a random probability distribution
As all statistics are random variables:
- All statistics arise from a random probability distribution
The probability distribution of a sample statistic is the sampling distribution
Let be the mean of a random sample of size , drawn from a population with mean and standard deviation
Since is a random variable, it has the mean and the standard deviation
- The mean of is . That is,
- The standard deviation of is . That is,
Sampling Distribution
Given any population
Sample
Sample mean
Where:
And:
The intuition behind this may not be self evident, but it’s easy to visualize:
Example:
Suppose we take a simple random sample of size 25 from a normal population with a mean of 20 and a standard deviation of 4.
a. What is the distribution of ?
b. Find the probability that we will observe a sample mean over 22.
c. Find the 95th percentile of .
Look up 0.95 in the body of z-table:
Convert to as follows:
What if the population we are sampling from isn’t normal
- It’s easier to find a way to assume that is a normal random variable
Given the Central Limit Theorem, we can do that under certain assumptions
Central Limit Theorem
Let be the mean of a large random sample () from any population
- With mean and standard deviation
The distribution of is approximately normal
Mean
Standard deviation .
If is large enough, we have:
- Regardless of the original population’s distribution
How large does need to be?
This is an on-going debate in statistics
As the skew of the distribution increases, our requirements for larger increases
As a general rule of thumb, should be sufficient
Example:
Recent data from the U.S. Census indicates that the mean age of college students is years, with a standard deviation of years. A simple random sample of 125 students is drawn. If the sample mean age of the students, what is the distribution of ? (Justify your answer.)
Since :
So:
Example:
The Internal Revenue Service reports that the mean federal income tax paid in a recent year was . Assume that the standard deviation is . The IRS plans to draw a sample of tax returns to study the effect of a new tax law.
Let the mean tax for the sampled tax returns
- Then by the CLT
a. What is the probability that the sample mean tax is between and ?
b. Would it be unusual if the sample mean were less than ?
Yes, because
Population Proportion
Proportions are a useful way to interpret information about a population and sample without losing very much nuance at all:
Proportions are just percentages of the population
We’ve dealt with this a lot
Say the percentage of the population who participate in early voting is
The proportion of the population who early vote,
If we poll a sample of 100 Manhattan residents and find that early vote:
- The proportion of our sample who early vote,
Just like every other statistic, sample proportions are random variables
- So their distribution is the sampling distribution of the proportion
All of our previous rules and ideas apply
As we take samples from our population we will see they aren’t consistent
The more we sample the closer we get to true values
- Mean of the sample proportion is:
- Standard deviation of sample proportion is:
The Central Limit Theorem will tell us the “shape” of the distribution of
Proportion Central Limit Theorem
If and
Distribution of is approximately normal
Mean
Standard deviation
So:
Example:
According to a Harris Poll, chocolate is the favorite ice cream flavor for 27% of Americans. If a sample of 100 Americans is taken, what is the probability that the sample proportion of those who prefer chocolate is greater than 0.30?
Since and , we can apply the CLT. By the CLT, the distribution of is approximately normal with:
Then,
We’ve studied point estimates — single number estimates — to estimate population parameters (e.g., sample mean, sample proportion)
Point estimates are a deterministic result
- Statistics deals with probabilistic results
It would be more informative to provide a range of values
We generally call these confidence intervals and we’ll be talking about them more in-depth next lecture
- Go away