A Confidence intervals

A.1 Concepts

A.1.1 Standard error

When we estimate some unknown parameter θ, for example the mean of a distribution, the point estimate ˆθ is usually imprecise. Repeating the same experiment over and over would give us a range of estimates. The standard error se(ˆθ) of ˆθ is an estimate of the standard deviation of ˆθ in repeated random samples. It is very important to understand that the terms “standard error” and “standard deviation” are different concepts (Altman and Bland, 2005).

A.1.2 Confidence interval

The point estimate and the standard error can be used to compute a γ100 % confidence interval (CI) for the unknown parameter θ. In practice, we choose a large confidence level γ (between 0 and 1), usually γ=0.95. A CI is an interval estimate. Its width describes the precision of the point estimate ˆθ.

A CI contains a range of plausible values of θ. The exact frequentist interpretation of a CI is: For repeated random samples from a distribution with unknown parameter θ, a γ100 % CI will cover θ in γ100 % of all cases.

A.2 Methods

A.2.1 Normal approximation

The most common construction of a CI for a parameter θ is a Wald CI, which is symmetric around the point estimate ˆθ:

ˆθzγse(ˆθ)toˆθ+zγse(ˆθ)

This construction relies on the normal approximation of the estimator in case of a large sample size. Any normal random variable XN(μ,σ2) can be transformed into a standard normal random variable by (Xμ)/σ. Since a standard normal random variable Z is symmetric around 0, there is a value zγ such that with a probability of 95%, Z takes a value between zγ and zγ. This value zγ is the (1+γ)/2-quantile of the standard normal distribution. For γ=0.95, we have zγ1.96.

For small sample size n, an approximate t-distribution with n1 degrees of freedom is more appropriate than the normal approximation. Then, a symmetric CI as in (A.1) but with tγ instead of zγ is used (t-test):

ˆθtγse(ˆθ)toˆθ+tγse(ˆθ)

A.2.2 Delta method

Suppose we know the standard error of an estimator ˆθ. Then, a standard error of the transformed estimator h(ˆθ) is given by

se(h(ˆθ))=se(ˆθ)|dh(ˆθ)dθ|

if ˆθ is a consistent estimator of θ and dh(θ)dθ0. This is called the delta method (Held and Sabanés Bové, 2020, Section.Section 3.2.4, p.63).

A.2.3 Difference

Suppose we know the standard errors of two estimators ˆθ1,ˆθ2 from independent groups. Since Var(XY)=Var(X)+Var(Y) for two independent random variables X and Y, the standard error of the difference Δ=ˆθ1ˆθ2 can be calculated as

se(Δ)=se(ˆθ1)2+se(ˆθ2)2.

This standard error can be used to compute a symmetric CI for the difference (in an unpaired scenario) as in (A.1) or (A.2).

A.2.3.1 Square-and-add method

Suppose that, for two independent groups, we know the lower and upper limits l1 and u1 of a CI for θ1 and l2 and u2 of a CI for θ2. The square-and-add method computes a CI for the difference ˆθ1ˆθ2 using these limits as follows:

Δ(ˆθ1l1)2+(u2ˆθ2)2toΔ+(ˆθ2l2)2+(u1ˆθ1)2

When the limits of Wald CIs for θ1 and θ2 are used, then the square-and-add method results in the Wald CI for the difference. This is how this method was originally discovered. However, it is recommended to use the limits of Wilson CIs instead to improve the properties of the CI (also called Newcombe’s method described in Newcombe (1998a) or Altman et al (2000) (Section 6, p.49). This CI preserves the property of the Wilson CIs to be boundary-respecting (no overshoot).

A more complicated variant of the square-and-add method (with Wilson CIs) can also be used for the paired case, where both groups consist of the same patients (see Newcombe (1998b) or Altman et al (2000) (Section 6, p.52)).

A.2.4 Substitution method

Sometimes it is preferable to compute a CI for a transformation of the unknown parameter θ. For example, it is usually easier to look at log-transformed ratios such as log(RR) since the logarithm transforms a ratio into a difference. It is a useful trick to compute CIs on the log-scale with the log-transformed point estimate and then back-transform by exponentiating the limits of the CI. Note that when we exponentiate the limits of a symmetric CI, the obtained CI is no longer symmetric around the point estimate ˆθ.

Suppose we have calculated a 95% Wald CI for a log-transformed parameter log(θ). Back-transformation to the original scale gives the 95% CI

ˆθ/EF.95toˆθEF.95

for θ, where

EF.95=exp(zγse(log(ˆθ)))

denotes the 95% error factor.

The substitution method can also be useful with other transformations such as the logit transformation or Fisher’s z-transformation. While transformations of the parameter to estimate also transform the CI, the P-value is not affected by transformations but the reference value from the corresponding hypothesis test is transformed too.

A.2.5 Bootstrap

Remember that a CI relies on the concept of repeated sampling. We can also construct a CI by simulation. This is called a bootstrap CI. A bootstrap CI is especially helpful if the sample size is small or if we cannot assume an underlying normal distribution. Then, we draw a large number of samples of the same size with replacement from the data. For all these samples, we compute the point estimate, for example a mean. The simplest 95% bootstrap CI is given by computing the 2.5% and 97.5% quantile of the estimators. There are better bootstrap methods reducing bias but they will not be discussed here.

Bootstrap CIs have by construction a good empirical coverage and do not overshoot. No overshoot means that the limits of the CI lie in the range of values that the unknown parameter can take. For example, the limits of a CI for a proportion should be between 0 and 1. Bootstrap CIs are usually not symmetric around the point estimate.

A.3 Continuous outcome

In the following, we discuss commonly used CIs for continuous outcomes.

A.3.1 One group

Example A.1 The data PimaIndiansDiabetes2from the package mlbench contains diastolic blood pressure levels from n=252 diabetic women and m=481 non-diabetic women (missing values excluded). We are interested in mean blood pressure μD in the diabetic group. Denote the observations in this group by x1,,xn. Suppose s2D is the sample variance of x1,,xn. Then:

  • ˆμD=ˉx
  • se(ˆμD)=sDn

We build a Wald CI for μD as in Equation (A.1):

ˆμDzγse(ˆμD)toˆμD+zγse(ˆμD)

The Wald CI can be computed in R:

## Data
library(mlbench)
data("PimaIndiansDiabetes2")

## Exclude missing values
ind <- which(!is.na(PimaIndiansDiabetes2$pressure))
pima <- PimaIndiansDiabetes2[ind, ]

summary(pima[, c("pressure", "diabetes")])
##     pressure      diabetes 
##  Min.   : 24.00   neg:481  
##  1st Qu.: 64.00   pos:252  
##  Median : 72.00            
##  Mean   : 72.41            
##  3rd Qu.: 80.00            
##  Max.   :122.00
## Blood pressure levels for the diabetic group
ind <- which(pima$diabetes == "pos")
bpDiabetic <- pima$pressure[ind]

## Wald CI
n <- length(bpDiabetic)
mu <- mean(bpDiabetic)
se <- sd(bpDiabetic) / sqrt(n)
z <- qnorm((1+0.95)/2)
mu + (z * c(-se, +se))
## [1] 73.80281 76.84005

A symmetric CI based on the t-distribution can be conveniently computed in R using a one sample t-test which also reports a CI.

t.test(bpDiabetic, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  bpDiabetic
## t = 97.212, df = 251, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  73.79545 76.84740
## sample estimates:
## mean of x 
##  75.32143

Wald and t-test CIs are very similar due to the large sample size n=252.

A.3.2 Two unpaired groups with equal variance

Example A.2 We consider the same example as before (Example A.1). Denote the observations in the non-diabetic group by y1,,ym, the mean by μˉD and the sample variance of y1,,ym by s2ˉD. This group is independent from the diabetic group (unpaired). We are interested in the mean difference Δ=μDμˉD. If the two groups have equal variances (hence equal standard deviations), then:

  • ˆΔ=ˉxˉy
  • se(ˆΔ)=s1m+1n, where s2=(m1)s2ˉD+(n1)s2Dm+n2

We build a symmetric CI for Δ using the t-distribution with m+n2 degrees of freedom:

ˆΔtγse(ˆΔ)toˆΔ+tγse(ˆΔ)

InR,such a CI is reported in a two sample t-test where we need to specify the argument var.equal = TRUE (the default value is FALSE):

## Check ordering of groups
levels(pima$diabetes)
## [1] "neg" "pos"
pima$diabetes <- factor(pima$diabetes, levels = c("pos", "neg"))

levels(pima$diabetes)
## [1] "pos" "neg"
t.test(pressure ~ diabetes, data = pima, 
       conf.level = 0.95, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  pressure by diabetes
## t = 4.6808, df = 731, p-value = 3.405e-06
## alternative hypothesis: true difference in means between group pos and group neg is not equal to 0
## 95 percent confidence interval:
##  2.58017 6.30801
## sample estimates:
## mean in group pos mean in group neg 
##          75.32143          70.87734
## Mean difference
res <- t.test(pressure ~ diabetes, data = pima, 
              conf.level = 0.95, var.equal = TRUE)
mean(res$conf.int)
## [1] 4.44409

A.3.3 Two unpaired groups with unequal variance

If the variances in the two groups are unequal, the standard error of the difference Δ is:

  • se(ˆΔ)=s2ˉDm+s2Dn

A Welch-Satterthwaite test using the t-distribution with adjusted degrees of freedom should be used.

Example A.3 InR,again in Example A.1, we can use t.test with the default argument var.equal = FALSE to compute the CI:

t.test(pressure ~ diabetes, data = pima,
       conf.level = 0.95, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  pressure by diabetes
## t = 4.6643, df = 504.72, p-value = 3.972e-06
## alternative hypothesis: true difference in means between group pos and group neg is not equal to 0
## 95 percent confidence interval:
##  2.572156 6.316023
## sample estimates:
## mean in group pos mean in group neg 
##          75.32143          70.87734

A.3.4 Two paired groups

In a paired design, we look at the differences for each patient. Then, we construct a CI as in the case of one group. See Example 5.1.

A.4 Binary outcome

Now, we discuss commonly used CIs for binary outcomes. Examples in this section come from Altman et al (2000) (Section 6).

A.4.1 One group

Example A.4 Of n=29 female prisoners who did not inject drugs, x=1 was found to be positive on testing for HIV on discharge. We are interested in the probability π with which such cases happen. Then:

  • ˆπ=xn
  • se(ˆπ)=ˆπ(1ˆπ)n

We can build a Wald CI for π as in (A.1). However, for proportions there is a better CI called Wilson CI. Especially if n is small or if ˆπ is very small (close to 0) or very large (close to 1), the Wilson CI has better properties: better empirical coverage and no overshoot. It is of the form:

x+z2γ/2n+z2γ±zγnn+z2γˆπ(1ˆπ)+z2γ4n

This formula is complicated. In practice, we use the package biostatUZH where both Wald and Wilson CIs for proportions are implemented.

library(biostatUZH)

## Data
x <- 1
n <- 29

wald(x, n, conf.level = 0.95)
##      lower       prop      upper 
## 0.00000000 0.03448276 0.10089224
wilson(x, n, conf.level = 0.95)
##       lower        prop       upper 
## 0.006113214 0.034482759 0.171755219

Note that when the Wald CI overshoots, the function wald () returns a CI that is truncated such that the limits lie between 0 and 1.

Also the function prop.test() computes the Wilson CI:

prop.test(x, n, conf.level = 0.95, correct = FALSE)
## 
##  1-sample proportions test without continuity
##  correction
## 
## data:  x out of n, null probability 0.5
## X-squared = 25.138, df = 1, p-value = 5.337e-07
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.006113214 0.171755219
## sample estimates:
##          p 
## 0.03448276

A.4.2 Two unpaired groups

Example A.5 A collaborative clinical trial assessed the value of extracorporeal membrane oxygenation for term neonates with severe respiratory failure. 63 out of 93 patients randomised to active treatment survived to one year compared with 38 out of 92 infants who received conventional management. Let π1 and π2 be the probabilities to survive up to one year in the two groups.

The two groups can be compared using different effect measures such as the absolute risk reduction (or risk difference or probability difference), the relative risk reduction (or risk ratio or relative risk) and the (sample) odds ratio. These effect measures are computed from the following 2×2 table:

For the absolute risk reduction ARR=π1π2, we have:

  • ^ARR=x1n1x2n2
  • se(^ARR)=ˆπ1(1ˆπ1)n1+ˆπ2(1ˆπ2)n2

The standard error is computed using equation (A.4) for the difference.

Wald and Wilson CIs can be computed inRusing the package biostatUZH. A Wald CI uses the above standard error. A Wilson CI uses the square-and-add method with Wilson CIs for π1 and π2. Also here, Wilson CIs have better properties.

## Data
tabRespiratory <- matrix(c(63, 38, 30, 54), nrow = 2)
colnames(tabRespiratory) <- c("Survived", "Not survived")
rownames(tabRespiratory) <- c("Treatment", "Control")

print(tabRespiratory)
##           Survived Not survived
## Treatment       63           30
## Control         38           54
confIntRiskDiff(x = tabRespiratory[, 1], 
                n = rowSums(tabRespiratory), 
                conf.level = 0.95)
## $rd
##           [,1]
## [1,] 0.2643759
## 
## $CIs
##     type     lower     upper
## 1   Wald 0.1259949 0.4027569
## 2 Wilson 0.1211604 0.3928557

For the risk ratio RR=π1/π2, we have:

  • ^RR=x1/n1x2/n2
  • log(^RR)=log(x1/n1x2/n2)
  • se(log(^RR))=1x11n1+1x21n2

The standard error is computed using equation (A.4) for the difference of two log proportions. The standard error of a log proportion can be computed using the standard error of a proportion and the delta method:

  • se(log(ˆπ1))=1x11n1

We use the substitution method to compute a CI:

CI for the log(^RR): log(^RR)zγse(log(^RR))tolog(^RR)+zγse(log(^RR))

CI for the ^RR: exp(log(^RR)zγse(log(^RR)))toexp(log(^RR)+zγse(log(^RR)))

which is the same as

^RR/EF.95to^RREF.95

For the odds ratio OR=π1/(1π1)π2/(1π2), we have:

  • ^OR=adbc
  • log(^OR)=log(adbc)
  • se(log(^OR))=1a+1b+1c+1d

The standard error is computed using equation (A.4) for the difference of two log odds. The standard error of the log odds can be computed from the standard error of a proportion and the delta method:

  • se(log(^π1(1^π1)))=1a+1b

We use the substitution method to compute a CI:

CI for the log(^OR): log(^OR)zγse(log(^OR))tolog(^OR)+zγse(log(^OR))

CI for the ^OR: exp(log(^OR)zγse(log(^OR)))toexp(log(^OR)+zγse(log(^OR)))

which is the same as

^OR/EF.95to^OREF.95

InR,Wald CIs for OR and RR and a Wilson CI for ARR can be computed using the function twoby2() from package Epi.

library(Epi)

twoby2(tabRespiratory, alpha = 0.05)
## 2 by 2 table analysis: 
## ------------------------------------------------------ 
## Outcome   : Survived 
## Comparing : Treatment vs. Control 
## 
##           Survived Not survived    P(Survived) 95% conf.
## Treatment       63           30         0.6774    0.5762
## Control         38           54         0.4130    0.3173
##           interval
## Treatment   0.7644
## Control     0.5159
## 
##                                    95% conf. interval
##              Relative Risk: 1.6401    1.2382   2.1724
##          Sample Odds Ratio: 2.9842    1.6361   5.4433
## Conditional MLE Odds Ratio: 2.9658    1.5688   5.6953
##     Probability difference: 0.2644    0.1212   0.3929
## 
##              Exact P-value: 0.0004 
##         Asymptotic P-value: 0.0004 
## ------------------------------------------------------

A.4.3 Two paired groups

Example A.6 In a reliability exercise carried out as part of the Multicenter Study of Silent Myocardial Ischemia, 41 patients were randomly selected from those who had undergone a thallium-201 stress test. The 41 sets of images were classified as normal or ischemic by the core thallium laboratory and, independently, by clinical investigators from different centers who had participated in training sessions to support standardization.

This is a paired design. The results for one of the participating clinical investigators compared to the core laboratory are presented in a contingency table showing the frequency distribution of these two variables:

The square-and-add method for paired data can be used to compute a Newcombe CI. This method is implemented in the package biostatUZH where the CI is computed from the contingency table for “Ischemic vs. Normal”:

## Data
tabIschemia <- matrix(c(14, 0, 5, 22), nrow = 2)
colnames(tabIschemia) <- c("Lab ischemic", "Lab normal")
rownames(tabIschemia) <- c("Clin ischemic", "Clin normal")

print(tabIschemia)
##               Lab ischemic Lab normal
## Clin ischemic           14          5
## Clin normal              0         22
confIntPairedProportion(tabIschemia, conf.level = 0.95)
## $p1
## [1] 0.4634146
## 
## $p2
## [1] 0.3414634
## 
## $diff
## [1] 0.1219512
## 
## $newcombeCI
## [1] 0.01148324 0.22649268

A.5 Confidence interval for a sample variance and standard deviation

Let S2=1n1ni=1(XiˉX)2 denote the sample variance. Standard statistical theory shows (Held and Sabanés Bové, 2020, sec. 3.2) that, under normality and independence of XiN(μ,σ2), i=1,n,

n1σ2S2χ2(n1) holds, so E(S2)=σ2, i.e. S2 is unbiaised for σ2.

Let χ2α(k) denote the α-quantile of the χ2(k) distribution. It follows that

[(n1)S2χ2(1+γ)/2(n1),(n1)S2χ2(1γ)/2(n1)]

is a γ×100% CI for σ2. Likewise, the square root of (A.9) gives a γ×100% CI for the standard deviation σ.

We may also construct Wald CI based on the standard error of S2 and S, respectively. From (A.8) it follows that

(n1)2σ4Var(S2)=2(n1), so Var(S2)=2n1σ4 and therefore se(S2)=2n1S2.

Application of the Delta method shows that

se(S)=se(S2)12S=S2(n1)S2n. These results allow to construct standard Wald CI for the within-subject standard deviation sw (Section 3.1.1) and the limits of agreements in a Bland-Altman plot (Section 3.3.1), for example.

References

Altman, D G and Bland, M J 2005 Statistics Notes: Standard deviations and standard errors. BMJ, 331: 903.
Altman, D G, Machin, D, Bryant, T N, and Gardner, M J 2000 Statistics with Confidence. Second Edition. BMJ Books.
Held, L and Sabanés Bové, D 2020 Likelihood and Bayesian Inference - With Applications in Biology and Medicine. Second Edition. Springer.
Newcombe, R G 1998a Interval estimation for the difference between independent proportions: Comparison of eleven methods. Stat Med, 17(8): 873–890.
Newcombe, R G 1998b Improved confidence intervals for the difference between binomial proportions based on paired data. Stat Med, 17(22): 2635–2650.