Chapter 7 Interval Estimation

7.1 Introduction to Interval Estimation

We have focused on point estimation, i.e. estimating an unknown parameter θ with a single value ˆθ. Now we turn our attention to interval estimation, where we will estimate θ with a range of plausible values. In fact, the confidence intervals we will consider this semester will all be of the form

Statistic±MarginofError where the ‘Statistic’ is a reasonable point estimator (statistic) for the parameter being estimated and the ‘Margin of Error’ is the standard error of the statistic multiplied by a ‘critical value’ based on that statistic’s sampling distribution.

Definition: Let X1,,Xn be a random sample from Xif(x;θ),θΩ. Let 0<α<1 be specified (i.e. choose a constant α, usually α=0.05). Then the interval (L,U) is a 100(1α)% confidence interval (CI) for θ if P[θ(L,U)]=1α

This is the frequentist definition, where we consider the parameter θ to be an unknown constant. The CI ‘captures’ the true value of θ 100(1α)% of the time. It DOES NOT MEAN that there is a 100(1α)% chance that the parameter θ is between the values L and U. (This is a very common mistake in interpretation of confidence intervals).

The Bayesian school of statistical thought considers parameters such as θ to be random variable with their own distributions, rather than unknown constants. Bayesians compute their own interval estimates that are often called credible intervals. The advantage of credible intervals over traditional confidence intervals is that the interpretation is clearer–for a Bayesian 100(1α)% credible interval, I can correctly say that there I am 100(1α)% confident that θ is between L and U. However, the Bayesian intervals are more difficult to compute than the traditional frequentist intervals. We will concentrate on the traditional intervals for the remainder of this semester.

7.2 Confidence Intervals for Proportions

This is Section 7.3 of your book.

Suppose XBIN(n,π). A natural point estimator (it is unbiased!) for π is ˆπ=Xn

When both nπ and n(1π) are 10 or greater, the distribution of X is well approximated by a normal distribution. X˙N(nπ,nπ(1π))

We would rather have a formula for our confidence interval for π to be based on ˆπ, so we need the sampling distribution of ˆπ.

μˆπ=E(ˆπ)=EXn=1nE(X)=1nnπ=π

σ2ˆπ=Var(ˆπ)=VarXn=1n2Var(X)=1n2nπ(1π)=π(1π)n

σˆπ=π(1π)n

ˆπ˙N(π,π(1π)n)

We will base our confidence interval formula for π on the standard normal distribution ZN(0,1), since the sampling distribution of ˆπ is approximately normal. Our formula will be the statistic plus/minus the margin of error, where the margin of error will be the standard error multiplied by a constant called the critical value. This critical value z is chosen such than 100(1α)% of the standard normal is between ±z.

So the formula for a 100(1α)% CI for π is:

ˆπ±zˆπ(1ˆπ)n

The usual choice is α=0.05, leading to a 95% confidence level. The appropriate z=±1.96. You can find z from the normal table or by using invNorm(0.025) and invNorm(0.975)

We can also find critical values with R for 95% confidence or other common choices such as α=0.10 or α=0.01, leading to 90% or 99% CIs, respectively.

alpha<-c(0.10,0.05,0.01)
qnorm(c(alpha/2,1-alpha/2))
## [1] -1.644854 -1.959964 -2.575829  1.644854  1.959964  2.575829

So we use z=±1.645 for 90% confidence, z=±1.96 for 95% confidence, and z=±2.576 for 99% confidence.

Example: Suppose we have taken a random sample (i.i.d. or simple) of n=500 voters, where X=220 support Richard Guy, a candidate for political office. The point estimate is ˆπ=220500=0.44

ASIDE: Polling companies and news organizations do not use i.i.d. or simple random samples; their sampling designs are more complicated and they compute CIs based on formulas with more complex forms of the standard error. In practice, you would see a minor difference if you recomputed the CI based on our formula.

The interval estimate for 95% confidence is ˆπ±zˆπ(1ˆπ)n0.44±1.960.44(10.44)5000.44±0.0435(0.3965,0.4835)

The margin of error of our poll is ±4.35% and the entire intervals lies below π=0.50, which is bad news for Richard Guy in his election.

This can be done on the TI-84 calculator with the 1-PropZInterval function under the STAT key.

What will happen to our interval if we change the confidence level from 95% to 99%???

As we’ll see in class, the margin of error increases and the confidence interval will be wider, since the critical value z is bigger to capture a larger percentage of the standard normal curve.

What about if the sample size increases, with ˆp remaining the same?? (i.e. n=1000 voters, X=440)

As the sample size increase, the margin of error will decrease and the CI will be narrower.

If we want to cut the margin of error in half, what do we need to do to the sample size?

Unfortunately, if ˆπ remains the same, we have to quadruple our sample size to cut the margin of error in half. Doubling the sample size is not sufficient if we want to cut the margin of error in half.

__What about the coverage of the Wald interval?

The problem with the Wald interval based on ˆp as described so far is that it depends on the normal approximation to the binomial distribution and that its “coverage” can be less than the stated percentage. In other words, we will capture the true value of a parameter less than 95% of the time.

# let's simulate this
# suppose you guess on a multiple choice exam where each question has 5 choices
# so the true value of the parameter is 1/5, call this theta
theta <- 1/5

# let's randomly generate 10000 samples of size n=100 each and compute the 95% confidence intervals
set.seed(11162021)
samp <- 10000
p.hat <- numeric(samp)
L <- numeric(samp)
U <- numeric(samp)
capture <- logical(samp)
N <- 100

for (i in 1:samp){
  X <- rbinom(n=1,size=N,prob=theta)
  p.hat[i] <- X/N
  L[i] <-  p.hat[i] - 1.96*sqrt(p.hat[i]*(1-p.hat[i])/N)
  U[i] <-  p.hat[i] + 1.96*sqrt(p.hat[i]*(1-p.hat[i])/N)
  capture[i] <- (L[i]< theta & theta < U[i]) # this is TRUE when L<theta<U
}

# is theta containted in CI #i?
table(capture) # TRUE should be 95%, or about 950
## capture
## FALSE  TRUE 
##   701  9299
# but coverage < 95%, so alpha > 5% "liberal"

# graph the first 100
require(ggplot2) # can't really do this graphing in base R
CI <- 1:100
p.hatCI <- p.hat[1:100]
LCI <- L[1:100]
UCI <- U[1:100]
captCI <- capture[1:100]
CIs <- data.frame(CI,p.hatCI,LCI,UCI,captCI)

ggplot(data=CIs,aes(x=CI,y=p.hatCI)) + 
  geom_pointrange(aes(ymin=LCI,ymax=UCI,color=captCI)) +
  geom_hline(yintercept=theta)

We can consider other estimators for p. A biased alternative, due to Wilson, is called the “plus-2/plus-4” estimator, based on adding 2 successes and 2 failures to the sample, thus adding 4 to the overall sample size. The “coverage” of this estimator is better, especially with small sample size n. The coverage will exceed the stated percentage and is thus conservative.

˜p=x+2n+4

The confidence interval is:

˜p±z˜p(1˜p)n

# let's randomly generate 10000 samples of size n=100 each and compute the 95% confidence intervals
# using this estimator, compare to the usual p.hat
set.seed(15112021)
samp <- 10000
p.tilde <- numeric(samp)
L <- numeric(samp)
U <- numeric(samp)
capture <- logical(samp)
N <- 100

for (i in 1:samp){
  X <- rbinom(n=1,size=N,prob=theta)
  p.tilde[i] <- (X+2)/(N+4)
  L[i] <-  p.tilde[i] - 1.96*sqrt(p.tilde[i]*(1-p.tilde[i])/N)
  U[i] <-  p.tilde[i] + 1.96*sqrt(p.tilde[i]*(1-p.tilde[i])/N)
  capture[i] <- (L[i]< theta & theta < U[i]) # this is TRUE when L<theta<U
}

# is theta containted in CI i?
table(capture) # better, coverage > 95%, so alpha < 5% "conservative"
## capture
## FALSE  TRUE 
##   325  9675

7.3 Confidence Intervals for Means

This is section 7.1 of your book.

Suppose we are willing to make some very strong asuumptions about our data. We will assume both normality and that the variance σ2 (and standard deviation σ) is known. So XN(μ,σ) with only μ unknown.

In this case, we can form a 100(1α)% CI for μ by making some minor adjustments to the formula that we previously used for a proportion.

The format will still be statistic/point estimate ± margin of error, where the margin of error is zSE. Instead of using σˆπ=^pi(1ˆπ)n as the standard error, we use the standard error σˉx=σn since ˉXN(μ,σ/n).

The 100(1α)% CI for μ is: ˉx±zσn

This is available on the TI calculator as ZInterval.

Example: Suppose it is known that X represents the score obtained on an ACT test, where XN(μ,σ=5), i.e. we know the true standard deviation is 5. Find a 95% CI for μ for a sample of size n=9 with ˉx=22.0.

ˉx±zσn22.0±1.965922.0±3.27(18.73,25.27)

Confidence Interval for a Single mean μ with σ unknown: The t Interval

However, σ2 and hence σ are usually unknown to us–it would be strange to have a scenario where σ was known but μ was not. A natural point estimate to use for the unknown σ is the sample standard deviation S and instead of basing inference on Z=ˉXμσ/nN(0,1) to base it on T=ˉXμS/n???

What is the distribution of T? The answer came from the Guinness Bwerey in Dublin, Ireland, during the early 20th century. William Gosset, aka ‘Student’ notice that the sampling distributions for sample means were not quite normal for small samples and he derived the Student’s t distribution.

It turns out that there are a number of distributions that are ‘derived’ from the standard normal distribution that play key roles in statistical methods. Here, I will present but not derive how the χ2, t, and F distributions came about.

Suppose Z is standard normal, i.e. ZN(0,1). It can be shown that the distribution of Z2χ2(1), that is, a squared z-score follows a chi-squared distribution with 1 degree of freedom.

If I have an i.i.d. random sample of size n from ZN(0,1), then by the mgf technique, the distribution of V=Z21+Z22++Z2n is Vχ2(n).

Then (via either the cdf or pdf-to-pdf technique), if ZN(0,1) and Vχ2(n), then T=ZV/nt(n) that is, T has a Student’s t distribution with n degrees of freedom.

Also (again (via either the cdf or pdf-to-pdf technique), if Uχ2(m) and Vχ2(n), then F=U/mV/nF(m,n), that is F has a Snedecor’s F distribution with m and n degrees of freedom.

Finally, if Tt(n), then T2F(1,n).

The χ2, t, and F distributions traditionally appear in statistical tables and are all programmed into R with the usual d-, p-, q- and r- functions.

Let’s look at the t distribution graphically.

require(mosaic)
## Loading required package: mosaic
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Attaching package: 'mosaic'
## The following object is masked from 'package:purrr':
## 
##     cross
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## The following object is masked from 'package:Matrix':
## 
##     mean
## The following object is masked from 'package:ggplot2':
## 
##     stat
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
x<-seq(-3,3,by=0.001)
t5<-dt(x=x,df=5)
t15<-dt(x=x,df=15)
t30<-dt(x=x,df=30)
z<-dnorm(x=x,mean=0,sd=1)
my.colors<-c("blue","red","green","black")
my.key<-list(space = "bottom",
             lines=list(type="l",lty=c(1,1,1,2),col=my.colors),
             text=list(c("t(5)","t(15)","t(30)","z")))    
xyplot(t5+t15+t30+z~x,type="l",lty=c(1,1,1,2),col=my.colors,ylab="Density",
       xlab="X",key=my.key)

Visually, notice that the t distribution is centered at zero and is bell-shaped, but is flatter and has higher variance than the standard normal. As the degrees of freedom df, the t distribution converges to the standard normal.

To form a confidence interval for a mean μ from a random sample of size n from XN(μ,σ) with unknown variance, we will use s and a critical value from the t distribution with n1 df.

The 100(1α)% CI for μ is: ˉx±tσn

Example: I want a 95% CI for μ for a random sample of size n=16 where ˉx=32.4 and s=4.4. To find t, either use a t table, the invT function on a calculator, or the qt function for quantiles from the t distribution.

On the calculator, use invT(0.025) and invT(0.975), obtaining t=±2.131 if 95% confidence is desired.

# t* for 95%
qt(p=0.975,df=15)         
## [1] 2.13145

ˉx±tsn32.4±2.1314.41632.4±2.34(30.06,34.74)

This is also available on the TI calculator as TInterval and as part of the output that the R function t.test provides. We’ll consider that function later.

7.4 Bootstrapping

put notes and R code here

7.5 Confidence Intervals for the Difference of Two Means

This is section 7.2 in your book.

Inference can also be done with a confidence interval rather than a hypothesis test. If we want a confidence interval of the difference in means, which is the parameter μ1μ2, we have the same issue with the variances as in the hypothesis test.

We can assume equal variance, use df=n1+n22 and the pooled variance.

¯x1¯x2±ts2p(1n1+1n2)

Most modern textbooks recommend assuming unequal variances, and instead computing the confidence interval based on not using the pooled variance (i.e. similar to Welch’s t-test).

¯x1¯x2±ts21n1+s22n2

To obtain the critical value t, one could use the conservative degrees of freedom dfmin. This will result in using a critical value larger than you should, and thus the margin of error of the interval would be too large.

It is preferable to use technology with the exact df, which can be done on a TI calculator. Go to STAT, then TESTS, and choose 0:2-SampTInt. Use Pooled=No for unequal varainces, which will be the default in R.

Suppose we wanted to form a confidence interval for the difference in means between students in the same course who took a final exam on Monday (the first day of finals week) versus Friday (the last day of finals week).

\text{Friday:} \: \: 74 \: 72 \: 65 \: 96 \: 45 \: 62 \: 82 \: 67 \: 63 \: 93 \: 29 \: 68 \: 47 \: 80 \: 87

\text{Monday} \: \: 100 \: 86 \: 87 \: 89 \: 75 \: 88 \: 81 \: 71 \: 87 \: 97 \: 83 \: 81 \: 49 \: 71 \: 63 \: 53 \: 77 \: 71 \: 86 \: 78

scores <- c(76,72,65,96,45,62,82,67,68,93,39,68,47,80,87,
                 100,86,87,89,75,88,81,71,87,97,83,81,49,71,63,50,77)
day <- c(rep("Friday",15),rep("Monday",17))
exam <- data.frame(scores,day)
require(mosaic)
favstats(scores~day,data=exam,type=2)
##      day min Q1 median Q3 max     mean       sd  n missing
## 1 Friday  39 62     68 82  96 69.80000 16.89970 15       0
## 2 Monday  49 71     81 87 100 78.52941 14.33578 17       0
boxplot(scores~day,data=exam,horizontal=TRUE)

Doing this problem by hand assuming unequal variances, my “conservative” df will be 14 and the critical value for a 95% confidence interval is t^*= 2.145. So:

\bar{x_1} - \bar{x_2} \pm t^* \sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}} 69.800 - 78.529 \pm 2.145 \sqrt{\frac{16.89970^2}{15} + \frac{14.33578^2}{17}}

-8.729 \pm 11.968

(-20.697,3.239)

Notice that the null value of zero IS contained in this confidence interval, so we would not conclude that the difference in mean exam scores between the two days is statistically significant. We do not have enough evidence to defend such a claim.

With R:

t.test(scores~day,data=exam) # assumes unequal variances, does not use pooled variance
## 
##  Welch Two Sample t-test
## 
## data:  scores by day
## t = -1.5646, df = 27.664, p-value = 0.129
## alternative hypothesis: true difference in means between group Friday and group Monday is not equal to 0
## 95 percent confidence interval:
##  -20.164445   2.705622
## sample estimates:
## mean in group Friday mean in group Monday 
##             69.80000             78.52941
# t.test(scores~day,data=exam,var.equal=TRUE) # assumes equal variances, uses pooled variance

Notice the difference in the calculation is due to R using df=27.664 rather than our conservative estimate df=14.