6 Foundations for inference
In statistics, we are often interested in knowing some summary statistics about the population of interest (for example, the average income of all American adults, the median weight of all Nebraskans, the number of homeless people in Nebraska, …). It is very difficult to collect information on all American adults, or on all Nebraskans, or on all homeless people in Nebraska. In general, it is not practical to collect information on the whole population. So, we often rely on samples. But, how likely the summary statistics we get from a sample is representative of the summary statistics of the whole population. That is the subject of inferential statistics. Inferential statistics consists of generalizing from a sample to a population.
6.1 Variability in estimates (Sampling and Sampling Distribution)
Notice that if we take several samples from a population, each sample is likely to give us different summary statistics. Now, how can we generalize our findings if we have only a single sample of data? We will rely on the central limit theorem to make this generalization.
6.1.1 The Central Limit Theorem (CLT)
Basically, the CLT says that the sample mean ˉX has a normal distribution with mean μˉX=μ and variance σ2ˉX=σ2/n. That is, the mean of ˉX (μˉX) is the same as the mean of the population (μ), and the variance of ˉX (σ2ˉX) is the variance of the population divided by the sample size n. Moreover, ˉX has a bell curved type of distribution. What can we do with this information? ( hint: Think about the z-score)
Definitions:
μ and σ2 refer to the population mean and variance respectively. They are called population parameters. By definition, a parameter is a numerical summary of the population.
ˉX and s2 refer to the sample mean and variance respectively. They are called sample statistics. By definition, a statistic is a numerical summary of a sample taken from the population.
The probability distribution of a statistic is called the “sampling distribution”.
Why do we need to know the CLT?
Answer: it provides the theoretical foundation of inferential statistics. By inferential statistics we mean generalizing from a sample to the general population.
Challenge: We are asked to think as if. Generally we only have one sample, so we have one mean. How then can we talk about sampling distribution of the mean? Note that our sample is just one sample. If we could get a new sample, that sample will likely be different. Assume we have the luxury of taking hundreds of samples with the same size n; then compute the mean for each sample. The set of computed means has a distribution. The central limit theorem tells us that if the sample size is big enough, this distribution is normal. What do we mean by big enough? There is no formal answer. Textbooks recommend that n=30 is good enough.
Can you see why we spent some time on the normal distribution?
Examples:
Find the mean and the standard deviation of the sampling distribution. You take a simple random sample (SRS) of size 25 from a population with mean 200 and standard deviation 10. Find the mean and standard deviation of the sampling distribution of your sample mean.
The effect of increasing the sample size. In the setting of the previous exercise, repeat the calculations for a sample size of 100. Explain the effect of the increase on the sample mean and standard deviation.
Note: We call the standard deviation of a statistic, standard error.
6.2 Confidence intervals
- Recall, from previous lectures, that if the random variable X follow a normal distribution with mean μˉX and standard deviation σˉX, then we can convert the random variable X to its standard form, i.e. zx=X−μxσx
One reason we want to convert our data to standard form is that we can easily read off probabilities of some conditions on z-scores from the standard normal table (or the standard normal app) (see link http://www.mathsisfun.com/data/standard-normal-distribution.html). The picture below shows an example of a normaly distributed data and the standardized form of the same data. Remember that with the standardized form of the data, we can say something about the probabilities of some conditions on the z-scores. For example, what is the probability of z≤−1?

Standardized normal distribution
Example: Assume the random variable X follows a normal distribution with mean μX=10 and standard deviation σX=3. What is the probability that X≥16?
What is the probability that 4≤X≤16?
Actually, we used 2 for simplicity when we said that about 95% of all observations fall within 2 standard deviations. Note that P(−2≤z≤2)≠95% (check that with the online app on normal distribution). P(−2≤z≤2)=95.4%. We should use 1.96 instead of 2 if we want to get 95%, precisely (i.e P(−1.96≤z≤1.96)=95%)
- Recall also, from section 5.1.1, that the CLT says that the sample mean (ˉX) has a normal distribution with mean μˉX=μ and variance σ2ˉX=σ2/n. μ is the population mean (also called true mean), and σ2 is the population variance (also called true variance).
The z-score for any ˉX is then: zˉX=ˉX−μˉXσˉX=ˉX−μσ/√n
Based on 1 and 2, we know that:
P(−1.96≤zˉX≤1.96)=95%. We can derive the formula of confidence interval as:
P(−1.96≤ˉX−μˉXσˉX≤1.96)=95%⟺
P(−1.96≤ˉX−μσ/√n≤1.96)=95%⟺
P(−1.96σ/√n≤ˉX−μ≤1.96σ/√n)=95%⟺
P(−1.96σ/√n−ˉX≤−μ≤1.96σ/√n−ˉX)=95%⟺
P(1.96σ/√n+ˉX≥μ≥−1.96σ/√n+ˉX)=95%⟺
P(ˉX−1.96σ/√n≤μ≤ˉX+1.96σ/√n)=95%
The interval [ˉX−1.96σ/√n , ˉX+1.96σ/√n] is known as the 95% confidence interval, or the 95% interval estimate. The following picture illustrates the idea of confidence interval:

Idea of confidence interval
Note: 1.96 is the value of z that gives us 95% confidence interval. It is noted z0.025 because we have 2.5% of observations that lie above that z. More generally, we will write zα to refer to the value of z for which the proportion of observations lying above it is α/2. Thus, the general formula of the confidence interval is: [ˉX−zα/2σ/√n , ˉX+zα/2σ/√n]
Most of the time, we will be using α=1, α=5, or α=10. Why these number? It is just a tradition.

Level of Confidence
Problem: How would you interpret the interval [ˉX−zα/2σ/√n , ˉX+zα/2σ/√n].
Wouldn’t we get a different interval if we had a different sample? The answer is yes! Then, what does our interpretation really mean?
Every sample will likely give us a different interval, and they cannot all mean what our naïve interpretation says.
Here is the interpretation of confidence interval: A proportion of (1−α) of such intervals will contain the true value of the parameter (the mean μ in our case). This is a little bit technical.
Here is another interpretation: We are (1−α)100% confident that the interval [ˉX−zα/2σ/√n , ˉX+zα/2σ/√n] includes the population mean.
zα/2σ/√n is refers as the margin of error; and (1−α) is refers as the confidence level.
Practice Question:
Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating a potential location for a new outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location.
A sample of size n = 36 was taken; the sample mean income is $41,100. The population standard deviation is estimated to be $4,500, and the confidence coefficient to be used in the interval estimate is 0.95.
- What is the margin of error?
- What is the confidence interval for the population mean?
- Interpret the confidence interval.
Values of zα/2 for the Most Commonly Used Confidence Levels are:
Confidence Level | α | α/2 | zα/2 |
---|---|---|---|
90% | 10 | 5 | 1.645 |
95% | 5 | 2.5 | 1.96 |
99% | 1 | 0.5 | 2.576 |