A Confidence intervals

A.1 Concepts

A.1.1 Standard error

When we estimate some unknown parameter \(\theta\), for example the mean of a distribution, the point estimate \(\widehat \theta\) is usually imprecise. Repeating the same experiment over and over would give us a range of estimates. The standard error \(\mbox{se}(\widehat \theta)\) of \(\widehat \theta\) is an estimate of the standard deviation of \(\widehat \theta\) in repeated random samples. It is very important to understand that the terms “standard error” and “standard deviation” are different things (Douglas G. Altman and Bland 2005).

A.1.2 Confidence interval

The point estimate and the standard error can be used to compute a \(\gamma\cdot 100\) % confidence interval (CI) for the unknown parameter \(\theta\). In practice, we choose a large confidence level \(\gamma\) (between \(0\) and \(1\)), usually \(\gamma = 0.95\). A CI is an interval estimate. Its width describes the precision of the point estimate \(\widehat \theta\).

A CI contains a range of plausible values of \(\theta\). The exact frequentist interpretation of a CI is: For repeated random samples from a distribution with unknown parameter \(\theta\), a \(\gamma\cdot 100\) % CI will cover \(\theta\) in \(\gamma\cdot 100\) % of all cases.

A.2 Methods

A.2.1 Normal approximation

The most common construction of a CI for a parameter \(\theta\) is a Wald CI, which is symmetric around the point estimate \(\widehat\theta\):

\[\begin{equation} \tag{A.1} \widehat\theta - z_\gamma\cdot \mbox{se}(\widehat\theta)\;\;\; \text{to}\;\;\; \widehat\theta + z_\gamma\cdot \mbox{se}(\widehat\theta) \end{equation}\]

This construction relies on the normal approximation of the estimator in case of a large sample size. Any normal random variable \(X\sim \mathop{\mathrm{N}}(\mu, \sigma^2)\) can be transformed into a standard normal random variable by \((X-\mu)/\sigma\). Since a standard normal random variable \(Z\) is symmetric around \(0\), there is a value \(z_\gamma\) such that with a probability of \(95\)%, \(Z\) takes a value between \(-z_\gamma\) and \(z_\gamma\). This value \(z_\gamma\) is the \((1+\gamma)/2\)-quantile of the standard normal distribution. For \(\gamma = 0.95\), we have \(z_\gamma \approx 1.96\).

For small sample size \(n\), an approximate \(t\)-distribution with \(n-1\) degrees of freedom is more appropriate than the normal approximation. Then, a symmetric CI as in (A.1) but with \(t_\gamma\) instead of \(z_\gamma\) is used (\(t\)-test):

\[\begin{equation} \tag{A.2} \widehat\theta - t_\gamma\cdot \mbox{se}(\widehat\theta)\;\;\; \text{to}\;\;\; \widehat\theta + t_\gamma\cdot \mbox{se}(\widehat\theta) \end{equation}\]

A.2.2 Delta method

Suppose we know the standard error of an estimator \(\widehat{\theta}\). Then, a standard error of the transformed estimator \(h(\widehat{\theta})\) is given by % \[\begin{equation} \tag{A.3} \mbox{se}\bigr(h(\widehat{\theta})\bigr) = \mbox{se}(\widehat{\theta}) \cdot \Bigr| \dfrac{dh(\widehat{\theta})}{d\theta} \Bigr| \end{equation}\] % if \(\widehat{\theta}\) is a consistent estimator of \(\theta\) and \(\dfrac{dh(\theta)}{d\theta}\neq 0\). This is called the delta method .

A.2.3 Difference

Suppose we know the standard errors of two estimators \(\widehat{\theta}_1, \widehat{\theta}_2\) from independent groups. Since \(\mathop{\mathrm{Var}}(X-Y) = \mathop{\mathrm{Var}}(X) + \mathop{\mathrm{Var}}(Y)\) for two independent random variables \(X\) and \(Y\), the standard error of the difference \(\Delta = \widehat{\theta}_1 - \widehat{\theta}_2\) can be calculated as

\[\begin{equation} \tag{A.4} \mbox{se}(\Delta) = \sqrt{\mbox{se}(\widehat{\theta}_1)^2 + \mbox{se}(\widehat{\theta}_2)^2}. \end{equation}\]

This standard error can be used to compute a symmetric CI for the difference (in an unpaired scenario) as in (A.1) or (A.2).

A.2.3.1 Square-and-add method

Suppose that, for two independent groups, we know the lower and upper limits \(l_1\) and \(u_1\) of a CI for \(\theta_1\) and \(l_2\) and \(u_2\) of a CI for \(\theta_2\). The square-and-add method computes a CI for the difference \(\widehat{\theta}_1 - \widehat{\theta}_2\) using these limits as follows:

\[\begin{equation} \tag{A.5} \Delta - \sqrt{(\widehat{\theta}_1 - l_1)^2 + (u_2 - \widehat{\theta}_2)^2}\;\;\; \text{to}\;\;\; \Delta + \sqrt{(\widehat{\theta}_2 - l_2)^2 + (u_1 - \widehat{\theta}_1)^2} \end{equation}\]

When the limits of Wald CIs for \(\theta_1\) and \(\theta_2\) are used, then the square-and-add method results in the Wald CI for the difference. This is how this method was originally discovered. However, it is recommended to use the limits of Wilson CIs instead to improve the properties of the CI (also called Newcombe’s method described in Robert G. Newcombe (1998b) or Douglas G. Altman et al. (2000) (Section 6, p.49). For example, this CI preserves the property of the Wilson CIs to be boundary-respecting (no overshoot).

A more complicated variant of the square-and-add method (with Wilson CIs) can also be used for the paired case, where both groups consist of the same patients (see Robert G. Newcombe (1998a) or Douglas G. Altman et al. (2000) (Section 6, p.52)).

A.2.4 Substitution method

Sometimes it is preferable to compute a CI for a transformation of the unknown parameter \(\theta\). For example, it is usually easier to look at log-transformed ratios such as log(RR) since the logarithm transforms a ratio into a difference. It is a useful trick to compute CIs on the log-scale with the log-transformed point estimate and then back-transform by exponentiating the limits of the CI. Note that when we exponentiate the limits of a symmetric CI, the obtained CI is no longer symmetric around the point estimate \(\widehat{\theta}\).

Suppose we have calculated a \(95\)% Wald CI for a log-transformed parameter \(\log({\theta})\). Back-transformation to the original scale gives the \(95\)% CI

\[\begin{equation} \tag{A.6} \widehat{\theta}/\text{EF}_{.95}\; \;\; \text{to}\;\;\; \widehat{\theta}\cdot \text{EF}_{.95} \end{equation}\]

for \(\theta\), where

\[\text{EF}_{.95} = \exp\Bigr(z_\gamma \cdot \mbox{se}\bigr(\log(\widehat{\theta})\bigr)\Bigr)\]

denotes the \(95\)% error factor.

The substitution method can also be useful with other transformations such as the logit transformation or Fisher’s \(z\)-transformation. While transformations of the parameter to estimate also transform the CI, the \(p\)-value is not affected by transformations but the reference value from the corresponding hypothesis test is transformed too.

A.2.5 Bootstrap

Remember that a CI relies on the concept of repeated sampling. We can also construct a CI by simulation. This is called a bootstrap CI. A bootstrap CI is especially helpful if the sample size is small or if we cannot assume an underlying normal distribution. Then, we draw a large number of samples of the same size with replacement from the data. For all these samples, we compute the point estimate, for example a mean. The simplest \(95\)% bootstrap CI is given by computing the \(2.5\)% and \(97.5\)% quantile of the estimators. There are better bootstrap methods reducing bias but they will not be discussed here.

Bootstrap CIs have by construction a good empirical coverage and do not overshoot. No overshoot means that the limits of the CI lie in the range of values that the unknown parameter can take. For example, the limits of a CI for a proportion should be between \(0\) and \(1\). Bootstrap CIs are usually not symmetric around the point estimate.

A.3 Continuous outcome

In the following, we discuss commonly used CIs for continuous outcomes.

A.3.1 One group

Example A.1 The data {PimaIndiansDiabetes2 contains diastolic blood pressure levels from \(n=252\) diabetic women and \(m=481\) non-diabetic women (missing values excluded). We are interested in mean blood pressure \(\mu_D\) in the diabetic group. Denote the observations in this group by \(x_1,\ldots, x_n\). Suppose \(s_D^2\) is the sample variance of \(x_1,\ldots, x_n\). Then:

  • \(\widehat{\mu}_D = \bar{x}\)
  • \(\mbox{se}(\widehat{\mu}_D) = \dfrac{s_D}{\sqrt{n}}\)

We build a Wald CI for \(\mu_D\) as in (A.1):

\[\widehat{\mu}_D - z_\gamma\cdot \mbox{se}(\widehat{\mu}_D)\;\;\; \text{to}\;\;\; \widehat{\mu}_D + z_\gamma\cdot \mbox{se}(\widehat{\mu}_D)\]

The Wald CI can be computed in R:

## Data
library(mlbench)
data("PimaIndiansDiabetes2")

## Exclude missing values
ind <- which(!is.na(PimaIndiansDiabetes2$pressure))
pima <- PimaIndiansDiabetes2[ind, ]

summary(pima[, c("pressure", "diabetes")])
##     pressure      diabetes 
##  Min.   : 24.00   neg:481  
##  1st Qu.: 64.00   pos:252  
##  Median : 72.00            
##  Mean   : 72.41            
##  3rd Qu.: 80.00            
##  Max.   :122.00
## Blood pressure levels for the diabetic group
ind <- which(pima$diabetes == "pos")
bpDiabetic <- pima$pressure[ind]

## Wald CI
n <- length(bpDiabetic)
mu <- mean(bpDiabetic)
se <- sd(bpDiabetic) / sqrt(n)
z <- qnorm((1+0.95)/2)
mu + (z * c(-se, +se))
## [1] 73.80281 76.84005

A symmetric CI based on the \(t\)-distribution can be conveniently computed in R using a one sample \(t\)-test which also reports a CI.

t.test(bpDiabetic, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  bpDiabetic
## t = 97.212, df = 251, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  73.79545 76.84740
## sample estimates:
## mean of x 
##  75.32143

Wald and \(t\)-test CIs are very similar due to the large sample size \(n=252\).

A.3.2 Two unpaired groups with equal variance

Example A.2 We consider the same example as before (Example A.1). Denote the observations in the non-diabetic group by \(y_1,\ldots, y_m\), the mean by \(\mu_{\bar{D}}\) and the sample variance of \(y_1,\ldots, y_m\) by \(s_{\bar{D}}^2\). This group is independent from the diabetic group (unpaired). We are interested in the mean difference \(\Delta=\mu_D-\mu_{\bar{D}}\). If the two groups have equal variances (hence equal standard deviations), then:

  • \(\widehat{\Delta} = \bar{x}-\bar{y}\)
  • \(\mbox{se}(\widehat{\Delta}) = s \sqrt{\dfrac{1}{m} + \dfrac{1}{n}}\), where \(s^2=\dfrac{(m-1)\cdot s_{\bar{D}}^2 + (n-1)\cdot s_D^2}{m+n-2}\)

We build a symmetric CI for \(\Delta\) using the \(t\)-distribution with \(m+n-2\) degrees of freedom:

\[\widehat{\Delta} - t_\gamma\cdot \mbox{se}(\widehat{\Delta})\;\;\; \text{to}\;\;\; \widehat{\Delta} + t_\gamma\cdot \mbox{se}(\widehat{\Delta})\]

In R, such a CI is reported in a two sample \(t\)-test where we need to specify the argument var.equal = TRUE (the default value is {}):

## Check ordering of groups
levels(pima$diabetes)
## [1] "neg" "pos"
pima$diabetes <- factor(pima$diabetes, levels = c("pos", "neg"))

levels(pima$diabetes)
## [1] "pos" "neg"
t.test(pressure ~ diabetes, data = pima, 
       conf.level = 0.95, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  pressure by diabetes
## t = 4.6808, df = 731, p-value = 3.405e-06
## alternative hypothesis: true difference in means between group pos and group neg is not equal to 0
## 95 percent confidence interval:
##  2.58017 6.30801
## sample estimates:
## mean in group pos mean in group neg 
##          75.32143          70.87734
## Mean difference
res <- t.test(pressure ~ diabetes, data = pima, 
              conf.level = 0.95, var.equal = TRUE)
mean(res$conf.int)
## [1] 4.44409

A.3.3 Two unpaired groups with unequal variance

If the variances in the two groups are unequal, the standard error of the difference \(\Delta\) is:

  • \(\mbox{se}(\widehat{\Delta}) = \sqrt{\dfrac{s_{\bar{D}}^2}{m} + \dfrac{s_D^2}{n}}\)

A Welch-Satterthwaite test using the \(t\)-distribution with adjusted degrees of freedom should be used.

Example A.3 In R, again in Example A.1, we can use t.test with the default argument var.equal = FALSE to compute the CI:

t.test(pressure ~ diabetes, data = pima,
       conf.level = 0.95, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  pressure by diabetes
## t = 4.6643, df = 504.72, p-value = 3.972e-06
## alternative hypothesis: true difference in means between group pos and group neg is not equal to 0
## 95 percent confidence interval:
##  2.572156 6.316023
## sample estimates:
## mean in group pos mean in group neg 
##          75.32143          70.87734

A.3.4 Two paired groups

In a paired design, we look at the differences for each patient. Then, we construct a CI as in the case of one group. See Example 5.1.

A.4 Binary outcome

Now, we discuss commonly used CIs for binary outcomes. Examples in this section come from Douglas G. Altman et al. (2000) (Section 6).

A.4.1 One group

Example A.4 Of \(n=29\) female prisoners who did not inject drugs, \(x=1\) was found to be positive on testing for HIV on discharge. We are interested in the probability \(\pi\) with which such cases happen. Then:

  • \(\widehat{\pi} = \dfrac{x}{n}\)
  • \(\mbox{se}(\widehat{\pi}) = \sqrt{\dfrac{\widehat{\pi}(1-\widehat{\pi})}{n}}\)

We can build a Wald CI for \(\pi\) as in (A.1). However, for proportions there is a better CI called Wilson CI. Especially if \(n\) is small or if \(\widehat{\pi}\) is very small (close to \(0\)) or very large (close to \(1\)), the Wilson CI has better properties: better empirical coverage and no overshoot. It is of the form:

\[\begin{equation} \tag{A.7} \dfrac{x + z_\gamma^2/2}{n + z_\gamma^2} \pm \dfrac{z_\gamma \sqrt{n}}{n + z_\gamma^2} \sqrt{\widehat{\pi}(1-\widehat{\pi}) + \dfrac{z_\gamma^2}{4n}} \end{equation}\]

This formula is complicated. In practice, we use the library biostatUZH where both Wald and Wilson CIs for proportions are implemented.

library(biostatUZH)

## Data
x <- 1
n <- 29

wald(x, n, conf.level = 0.95)
##      lower       prop      upper 
## 0.00000000 0.03448276 0.10089224
wilson(x, n, conf.level = 0.95)
##       lower        prop       upper 
## 0.006113214 0.034482759 0.171755219

Note that when the Wald CI overshoots, the function wald returns a CI that is truncated such that the limits lie between \(0\) and \(1\).

Also the function prop.test computes the Wilson CI:

prop.test(x, n, conf.level = 0.95, correct = FALSE)
## 
##  1-sample proportions test without continuity
##  correction
## 
## data:  x out of n, null probability 0.5
## X-squared = 25.138, df = 1, p-value = 5.337e-07
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.006113214 0.171755219
## sample estimates:
##          p 
## 0.03448276

A.4.2 Two unpaired groups

:::{.example} A collaborative clinical trial assessed the value of extracorporeal membrane oxygenation for term neonates with severe respiratory failure. \(63\) out of \(93\) patients randomised to active treatment survived to one year compared with \(38\) out of \(92\) infants who received conventional management. Let \(\pi_1\) and \(\pi_2\) be the probabilities to survive up to one year in the two groups.

The two groups can be compared using different effect measures such as the absolute risk reduction (or risk difference or probability difference), the relative risk reduction (or risk ratio or relative risk) and the (sample) odds ratio. These effect measures are computed from the following \(2\times 2\) table:

For the absolute risk reduction \(\mbox{ARR}= \pi_1 - \pi_2\), we have:

  • \(\widehat{\mbox{ARR}} = \dfrac{x_1}{n_1} - \dfrac{x_2}{n_2}\)
  • \(\mbox{se}(\widehat{\mbox{ARR}}) = \sqrt{\dfrac{\widehat{\pi}_1(1-\widehat{\pi}_1)}{n_1} + \dfrac{\widehat{\pi}_2(1-\widehat{\pi}_2)}{n_2}}\)

The standard error is computed using equation (A.4) for the difference.

Wald and Wilson CIs can be computed in R using the library biostatUZH. A Wald CI uses the above standard error. A Wilson CI uses the square-and-add method with Wilson CIs for \(\pi_1\) and \(\pi_2\). Also here, Wilson CIs have better properties.

## Data
tabRespiratory <- matrix(c(63, 38, 30, 54), nrow = 2)
colnames(tabRespiratory) <- c("Survived", "Not survived")
rownames(tabRespiratory) <- c("Treatment", "Control")

print(tabRespiratory)
##           Survived Not survived
## Treatment       63           30
## Control         38           54
confIntRiskDiff(x = tabRespiratory[, 1], 
                n = rowSums(tabRespiratory), 
                conf.level = 0.95)
## $rd
##           [,1]
## [1,] 0.2643759
## 
## $CIs
##     type     lower     upper
## 1   Wald 0.1259949 0.4027569
## 2 Wilson 0.1211604 0.3928557

For the risk ratio \(\mbox{RR}=\pi_1/\pi_2\), we have:

  • \(\widehat{\mbox{RR}} = \dfrac{x_1/n_1}{x_2/n_2}\)
  • \(\log(\widehat{\mbox{RR}}) = \log\Bigr(\dfrac{x_1/n_1}{x_2/n_2}\Bigr)\)
  • \(\mbox{se}\bigr(\log(\widehat{\mbox{RR}})\bigr) = \sqrt{\dfrac{1}{x_1} - \dfrac{1}{n_1} + \dfrac{1}{x_2} - \dfrac{1}{n_2}}\)

The standard error is computed using equation (A.4) for the difference of two log proportions. The standard error of a log proportion can be computed using the standard error of a proportion and the delta method:

  • \(\mbox{se}\bigr(\log(\widehat{\pi}_1)\bigr) = \sqrt{\dfrac{1}{x_1} - \dfrac{1}{n_1}}\)

We use the substitution method to compute a CI:

CI for the \(\log(\widehat{\mbox{RR}})\): \[\log(\widehat{\mbox{RR}}) - z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{RR}})\bigr)\;\;\; \text{to}\;\;\; \log(\widehat{\mbox{RR}}) + z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{RR}})\bigr)\]

CI for the \(\widehat{\mbox{RR}}\): \[\exp\Bigr(\log(\widehat{\mbox{RR}}) - z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{RR}})\bigr)\Bigr)\;\;\; \text{to}\;\;\; \exp\Bigr(\log(\widehat{\mbox{RR}}) + z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{RR}})\bigr)\Bigr)\]

which is the same as

\[\widehat{\mbox{RR}}/\text{EF}_{.95}\;\;\; \text{to}\;\;\; \widehat{\mbox{RR}}\cdot \text{EF}_{.95}\]

For the odds ratio \(\mbox{OR}=\dfrac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)}\), we have:

  • \(\widehat{\mbox{OR}} = \dfrac{a\cdot d}{b\cdot c}\)
  • \(\log(\widehat{\mbox{OR}}) = \log\Bigr(\dfrac{a\cdot d}{b\cdot c}\Bigr)\)
  • \(\mbox{se}\bigr(\log(\widehat{\mbox{OR}})\bigr) = \sqrt{\dfrac{1}{a} + \dfrac{1}{b} + \dfrac{1}{c} + \dfrac{1}{d}}\)

The standard error is computed using equation \(\eqref{diff1}\) for the difference of two log odds. The standard error of the log odds can be computed from the standard error of a proportion and the delta method:

  • \(\mbox{se}\bigr(\log(\widehat{\pi_1}(1-\widehat{\pi_1}))\bigr) = \sqrt{\dfrac{1}{a} + \dfrac{1}{b}}\)

We use the substitution method to compute a CI:

CI for the \(\log(\widehat{\mbox{OR}})\): \[\log(\widehat{\mbox{OR}}) - z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{OR}})\bigr)\;\;\; \text{to}\;\;\; \log(\widehat{\mbox{OR}}) + z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{OR}})\bigr)\]

CI for the \(\widehat{\mbox{OR}}\): \[\exp\Bigr(\log(\widehat{\mbox{OR}}) - z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{OR}})\bigr)\Bigr)\;\;\; \text{to}\;\;\; \exp\Bigr(\log(\widehat{\mbox{OR}}) + z_\gamma\cdot \mbox{se}\bigr(\log(\widehat{\mbox{OR}})\bigr)\Bigr)\]

which is the same as

\[\widehat{\mbox{OR}}/\text{EF}_{.95}\;\;\; \text{to}\;\;\; \widehat{\mbox{OR}}\cdot \text{EF}_{.95}\]

In R, Wald CIs for OR and RR and a Wilson CI for ARR can be computed using the function twoby2 from the library {}.

library(Epi)

twoby2(tabRespiratory, alpha = 0.05)
## 2 by 2 table analysis: 
## ------------------------------------------------------ 
## Outcome   : Survived 
## Comparing : Treatment vs. Control 
## 
##           Survived Not survived    P(Survived) 95% conf.
## Treatment       63           30         0.6774    0.5762
## Control         38           54         0.4130    0.3173
##           interval
## Treatment   0.7644
## Control     0.5159
## 
##                                    95% conf. interval
##              Relative Risk: 1.6401    1.2382   2.1724
##          Sample Odds Ratio: 2.9842    1.6361   5.4433
## Conditional MLE Odds Ratio: 2.9658    1.5688   5.6953
##     Probability difference: 0.2644    0.1212   0.3929
## 
##              Exact P-value: 0.0004 
##         Asymptotic P-value: 0.0004 
## ------------------------------------------------------

A.4.3 Two paired groups

Example A.5 In a reliability exercise carried out as part of the Multicenter Study of Silent Myocardial Ischemia, \(41\) patients were randomly selected from those who had undergone a thallium-\(201\) stress test. The \(41\) sets of images were classified as normal or ischemic by the core thallium laboratory and, independently, by clinical investigators from different centers who had participated in training sessions to support standardization.

This is a paired design. The results for one of the participating clinical investigators compared to the core laboratory are presented in a contingency table showing the frequency distribution of these two variables:

% latex table generated in R 4.3.3 by xtable 1.8-4 package % Wed Nov 20 17:07:15 2024

The square-and-add method for paired data can be used to compute a Newcombe CI. This method is implemented in the library biostatUZH where the CI is computed from the contingency table for “Ischemic vs. Normal”:

## Data
tabIschemia <- matrix(c(14, 0, 5, 22), nrow = 2)
colnames(tabIschemia) <- c("Lab ischemic", "Lab normal")
rownames(tabIschemia) <- c("Clin ischemic", "Clin normal")

print(tabIschemia)
##               Lab ischemic Lab normal
## Clin ischemic           14          5
## Clin normal              0         22
confIntPairedProportion(tabIschemia, conf.level = 0.95)
## $p1
## [1] 0.4634146
## 
## $p2
## [1] 0.3414634
## 
## $diff
## [1] 0.1219512
## 
## $newcombeCI
## [1] 0.01148324 0.22649268

A.5 Confidence interval for a sample variance and standard deviation

Let \(S^2 = \frac{1}{n - 1} \sum_{i = 1}^{n}(X_i - \bar X)^2\) denote the sample variance. Standard statistical theory shows (Held and Sabanés Bové 2020, sec. 3.2) that, under normality and independence of \(X_i \sim \mathop{\mathrm{N}}(\mu, \sigma^2)\), \(i = 1, \ldots n\),

\[ \tag{A.8} \frac{n - 1}{\sigma^2} S^2 \sim \chi^2(n - 1) \] holds, so \(\mathop{\mathrm{\mathsf{E}}}(S^2) = \sigma^2\), i.e. \(S^2\) is unbiaised for \(\sigma^2\).

Let \(\chi_\alpha^2(k)\) denote the \(\alpha\)-quantile of the \(\chi^2(k)\) distribution. It follows that

\[ \tag{A.9} \left[\frac{(n - 1)S^2}{\chi^2_{(1 + \gamma)/2} (n - 1) }, \frac{(n - 1)S^2}{\chi^2_{(1 - \gamma)/2} (n - 1) }\right] \]

is a \(\gamma \times 100\%\) CI for \(\sigma^2\). Likewise, the square root of (A.9) gives a \(\gamma \times 100\%\) CI for the standard deviation \(\sigma\).

We may also construct Wald CI based on the standard error of \(S^2\) and \(S\), respectively. From (A.8) it follows that

\[ \frac{(n - 1)^2}{\sigma^4} \mathop{\mathrm{Var}}(S^2) = 2(n - 1), \] so \(\mathop{\mathrm{Var}}(S^2) = \frac{2}{n - 1} \sigma^4\) and therefore \(\mbox{se}(S^2) = \sqrt{\frac{2}{n - 1}}S^2\).

Application of the Delta method shows that

\[ \mbox{se}(S) = \mbox{se}(S^2) \cdot \frac{1}{2S} = \frac{S}{\sqrt{2(n - 1)}} \approx \frac{S}{\sqrt{2n}}. \] These results allow to construct standard Wald CI for the within-subject standard deviation \(s_w\) (Section 3.1.1) and the limits of agreements in a Bland-Altman plot (Section 3.3.1), for example.

References

———. 2005. Statistics Notes: Standard deviations and standard errors.” BMJ 331: 903.
Altman, Douglas G, David Machin, Trevor N Bryant, and Martin J Gardner. 2000. Statistics with Confidence. Second. BMJ Books.
Held, Leonhard, and Daniel Sabanés Bové. 2020. Likelihood and Bayesian Inference - With Applications in Biology and Medicine. Second. Springer.
Newcombe, Robert G. 1998a. Improved confidence intervals for the difference between binomial proportions based on paired data.” Stat Med 17 (22): 2635–50.
———. 1998b. Interval estimation for the difference between independent proportions: Comparison of eleven methods.” Stat Med 17 (8): 873–90.