## 6.1 Goodness-of-fit tests for distribution models

Assume that an iid sample \(X_1,\ldots,X_n\) from an arbitrary distribution \(F\) is given.^{194} We next address the ** one-sample problem** of testing an statement about the

*unknown*distribution \(F\).

### 6.1.1 Simple hypothesis tests

We first address tests for the *simple null hypothesis*

\[\begin{align} H_0:F=F_0 \tag{6.1} \end{align}\]

against the most general alternative^{195}

\[\begin{align*} H_1:F\neq F_0, \end{align*}\]

where here and henceforth “\(F\neq F_0\)” means that there exists *at least one* \(x\in\mathbb{R}\) such that \(F(x)\neq F_0(x)\), and \(F_0\) is a **pre-specified**, **not-data-dependent** distribution model. This latter aspect is very important:

If some parameters of \(F_0\) are estimated from the sample, the presented tests for (6.1) will not respect the significance level \(\alpha\) for which they are constructed, and as a consequence they will be highly

conservative.^{196}

Recall that the ecdf (1.1) of \(X_1,\ldots,X_n\), \(F_n(x)=\frac{1}{n}\sum_{i=1}^n1_{\{X_i\leq x\}}\), is a nonparametric estimator of \(F\), as seen in Section 1.6. Therefore, a measure of proximity of \(F_n\) (driven by the sample) and \(F_0\) (specified by \(H_0\)) will be indicative of the veracity of (6.1): a “large” distance between \(F_n\) and \(F_0\) evidences that \(H_0\) is likely to be false.^{197} \(\!\!^,\)^{198}

The following three well-known goodness-of-fit tests arise from the same principle: considering as the test statistic a particular type of distance between the *functions* \(F_n\) and \(F_0\).

**Kolmogorov–Smirnov test**

*Test purpose*. Given \(X_1,\ldots,X_n\sim F\), it tests \(H_0: F=F_0\) vs. \(H_1: F\neq F_0\) consistently against all the alternatives in \(H_1\).*Statistic definition*. The test statistic uses the**supremum distance**^{199}\(\!\!^,\)^{200}between \(F_n\) and \(F_0\):\[\begin{align*} D_n:=\sqrt{n}\sup_{x\in\mathbb{R}} \left|F_n(x)-F_0(x)\right|. \end{align*}\]

If \(H_0:F=F_0\) holds, then \(D_n\) tends to be small. Conversely, when \(F\neq F_0\), larger values of \(D_n\) are expected, and the test rejects \(H_0\) when \(D_n\) is large.

*Statistic computation*. The computation of \(D_n\) can be efficiently achieved by realizing that the maximum difference between \(F_n\) and \(F_0\) happens at \(x=X_i\), for a certain \(X_i\) (observe Figure 6.1). From here, sorting the sample and applying the probability transformation \(F_0\) gives^{201}\[\begin{align} D_n=&\,\max(D_n^+,D_n^-),\tag{6.2}\\ D_n^+:=&\,\sqrt{n}\max_{1\leq i\leq n}\left\{\frac{i}{n}-U_{(i)}\right\},\nonumber\\ D_n^-:=&\,\sqrt{n}\max_{1\leq i\leq n}\left\{U_{(i)}-\frac{i-1}{n}\right\},\nonumber \end{align}\]

where \(U_{(j)}\) stands for the \(j\)-th

*sorted*\(U_i:=F_0(X_i)\), \(i=1,\ldots,n\).*Distribution under \(H_0\)*. If \(H_0\) holds and \(F_0\) is**continuous**, then \(D_n\) has an asymptotic^{202}cdf given by the*Kolmogorov–Smirnov’s \(K\)*:^{203}function\[\begin{align} \lim_{n\to\infty}\mathbb{P}[D_n\leq x]=K(x):=1-2\sum_{j=1}^\infty (-1)^{j-1}e^{-2j^2x^2}.\tag{6.3} \end{align}\]

*Highlights and caveats*. The Kolmogorov–Smirnov test is atest because its distribution under \(H_0\)*distribution-free**does not depend*on \(F_0\). However, this is the case only**if \(F_0\) is continuous**and the sample \(X_1,\ldots,X_n\) is also continuous, i.e., if the**sample has no ties**.^{204}If these assumptions are met, then the iid sample \(X_1,\ldots,X_n\stackrel{H_0}{\sim} F_0\) generates the iid sample \(U_1,\ldots,U_n\stackrel{H_0}{\sim} \mathcal{U}(0,1)\). As a consequence, the distribution of (6.2)**does not depend on \(F_0\)**.^{205}\(\!\!^,\)^{206}If \(F_0\) is not continuous or there are ties on the sample, the \(K\) function is*not*the true asymptotic cdf. An alternative for discrete \(F_0\) is given below. In case there are ties on the sample, a possibility is is to slightly perturb the sample in order to remove such ties.^{207}*Implementation in R*. For continuous data and continuous \(F_0\), the test statistic \(D_n\) and the asymptotic \(p\)-value^{208}are readily available through the`ks.test`

function. The asymptotic cdf \(K\,\) is internally coded as the`pkolmogorov1x`

function within the source code of`ks.test`

. For discrete \(F_0\), see`dgof::ks.test`

.

The construction of the Kolmogorov–Smirnov test statistic is illustrated in the following chunk of code.

```
# Sample data
<- 10
n <- 2
mu0 <- 1
sd0 set.seed(54321)
<- rnorm(n = n, mean = mu0, sd = sd0)
samp
# Fn vs F0
plot(ecdf(samp), main = "", ylab = "Probability", xlim = c(-1, 6),
ylim = c(0, 1.3))
curve(pnorm(x, mean = mu0, sd = sd0), add = TRUE, col = 4)
# Add Dn+ and Dn-
<- sort(samp)
samp_sorted <- pnorm(samp_sorted, mean = mu0, sd = sd0)
Ui <- (1:n) / n - Ui
Dn_plus <- Ui - (1:n - 1) / n
Dn_minus <- which.max(Dn_plus)
i_plus <- which.max(Dn_minus)
i_minus lines(rep(samp_sorted[i_plus], 2),
c(i_plus / n, pnorm(samp_sorted[i_plus], mean = mu0, sd = sd0)),
col = 3, lwd = 2, pch = 16, type = "o", cex = 0.75)
lines(rep(samp_sorted[i_minus], 2),
c((i_minus - 1) / n, pnorm(samp_sorted[i_minus], mean = mu0, sd = sd0)),
col = 2, lwd = 2, pch = 16, type = "o", cex = 0.75)
rug(samp)
legend("topleft", lwd = 2, col = c(1, 4, 3, 2),
legend = latex2exp::TeX(c("$F_n$", "$F_0$", "$D_n^+$", "$D_n^-$")))
```

**Exercise 6.1 **Modify if the parameters `mu0`

and `sd0`

in the previous code in order to have \(F\neq F_0\). What happens with \(\sup_{x\in\mathbb{R}} \left|F_n(x)-F_0(x)\right|\)?

Let’s see an example of the use of `ks.test`

(also available as `stats::ks.test`

), the function in base R.

```
# Sample data from a N(0, 1)
<- 50
n set.seed(3245678)
<- rnorm(n = n)
x
# Kolmogorov-Smirnov test for H_0: F = N(0, 1). Does not reject
<- ks.test(x = x, y = "pnorm")) # In "y" we specify F0 as a function
(ks ##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.050298, p-value = 0.9989
## alternative hypothesis: two-sided
# Structure of "htest" class
str(ks)
## List of 5
## $ statistic : Named num 0.0503
## ..- attr(*, "names")= chr "D"
## $ p.value : num 0.999
## $ alternative: chr "two-sided"
## $ method : chr "One-sample Kolmogorov-Smirnov test"
## $ data.name : chr "x"
## - attr(*, "class")= chr "htest"
# Kolmogorov-Smirnov test for H_0: F = N(0.5, 1). Rejects
ks.test(x = x, y = "pnorm", mean = 0.5)
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.24708, p-value = 0.003565
## alternative hypothesis: two-sided
# Kolmogorov-Smirnov test for H_0: F = Exp(2). Strongly rejects
ks.test(x = x, y = "pexp", rate = 1/2)
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.53495, p-value = 6.85e-14
## alternative hypothesis: two-sided
```

The following chunk of code shows how to call `dgof::ks.test`

to perform a correct Kolmogorov–Smirnov when \(F_0\) is discrete.

```
# Sample data from a Pois(5)
<- 100
n set.seed(3245678)
<- rpois(n = n, lambda = 5)
x
# Kolmogorov-Smirnov test for H_0: F = Pois(5) without specifying that the
# distribution is discrete. Rejects (!?) giving a warning message
ks.test(x = x, y = "ppois", lambda = 5)
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.20596, p-value = 0.0004135
## alternative hypothesis: two-sided
# We rely on dgof::ks.test, which works as stats::ks.test if the "y" argument
# is not marked as a "stepfun" object, the way the dgof package codifies
# discrete distribution functions
# Step function containing the cdf of the Pois(5). The "x" stands for the
# location of the steps and "y" for the value of the steps. "y" needs to have
# an extra point for the initial value of the function before the first step
<- 0:20
x_eval <- stepfun(x = x_eval, y = c(0, ppois(q = x_eval, lambda = 5)))
ppois_stepfun plot(ppois_stepfun, main = "Cdf of a Pois(5)")
```

```
# Kolmogorov-Smirnov test for H_0: F = Pois(5) adapted for discrete data,
# with data coming from a Pois(5)
::ks.test(x = x, y = ppois_stepfun)
dgof##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.032183, p-value = 0.9999
## alternative hypothesis: two-sided
# If data is normally distributed, the test rejects H_0
::ks.test(x = rnorm(n = n, mean = 5), y = ppois_stepfun)
dgof##
## One-sample Kolmogorov-Smirnov test
##
## data: rnorm(n = n, mean = 5)
## D = 0.38049, p-value = 5.321e-13
## alternative hypothesis: two-sided
```

**Exercise 6.2 **Implement the Kolmogorov–Smirnov test by:

- Coding a function to compute the test statistic (6.2) from a sample \(X_1,\ldots,X_n\) and a cdf \(F_0\).
- Implementing the \(K\) function (6.3).
- Calling the previous functions from a routine that returns the asymptotic \(p\)-value of the test.

Check that the implementations coincide with the ones of the `ks.test`

function when `exact = FALSE`

for data simulated from a \(\mathcal{U}(0,1)\) and any \(n\). *Note*: `ks.test`

computes \(D_n/\sqrt{n}\) instead of \(D_n\).

The following code chunk exemplifies with a small simulation study how the distribution of the \(p\)-values^{209} is uniform if \(H_0\) holds, and how it becomes more concentrated about \(0\) when \(H_1\) holds.

```
# Simulation of p-values when H_0 is true
set.seed(131231)
<- 100
n <- 1e4
M <- sapply(1:M, function(i) {
pvalues_H0 <- rnorm(n) # N(0, 1)
x ks.test(x, "pnorm")$p.value
})
# Simulation of p-values when H_0 is false -- the data does not
# come from a N(0, 1) but from a N(0, 2)
<- sapply(1:M, function(i) {
pvalues_H1 <- rnorm(n, mean = 0, sd = sqrt(2)) # N(0, 2)
x ks.test(x, "pnorm")$p.value
})
# Comparison of p-values
par(mfrow = 1:2)
hist(pvalues_H0, breaks = seq(0, 1, l = 20), probability = TRUE,
main = latex2exp::TeX("$H_0$"), ylim = c(0, 4))
abline(h = 1, col = 2)
hist(pvalues_H1, breaks = seq(0, 1, l = 20), probability = TRUE,
main = latex2exp::TeX("$H_1$"), ylim = c(0, 4))
abline(h = 1, col = 2)
```

**Exercise 6.3 **Modify the parameters of the normal distribution used to sample under \(H_1\) in order to increase and decrease the deviation from \(H_0\). What do you observe in the resulting distributions of the \(p\)-values?

**Cramér–von Mises test**

*Test purpose*. Given \(X_1,\ldots,X_n\sim F\), it tests \(H_0: F=F_0\) vs. \(H_1: F\neq F_0\) consistently against all the alternatives in \(H_1\).*Statistic definition*. The test statistic uses a**quadratic distance**^{210}between \(F_n\) and \(F_0\):\[\begin{align} W_n^2:=n\int(F_n(x)-F_0(x))^2\,\mathrm{d}F_0(x).\tag{6.4} \end{align}\]

If \(H_0: F=F_0\) holds, then \(W_n^2\) tends to be small. Hence, rejection happens for large values of \(W_n^2\).

*Statistic computation*. The computation of \(W_n^2\) can be significantly simplified by expanding the square in the integrand of (6.4) and then applying the change of variables \(u=F_0(x)\):\[\begin{align} W_n^2=\sum_{i=1}^n\left\{U_{(i)}-\frac{2i-1}{2n}\right\}^2+\frac{1}{12n},\tag{6.5} \end{align}\]

where again \(U_{(j)}\) stands for the \(j\)-th sorted \(U_i=F_0(X_i)\), \(i=1,\ldots,n\).

*Distribution under \(H_0\)*. If \(H_0\) holds and \(F_0\) is**continuous**, then \(W_n^2\) has an asymptotic cdf given by^{211}\[\begin{align} \lim_{n\to\infty}\mathbb{P}[W_n^2\leq x]=&\,1-\frac{1}{\pi}\sum_{j=1}^\infty (-1)^{j-1}W_j(x),\tag{6.6}\\ W_j(x):=&\,\int_{(2j-1)^2\pi^2}^{4j^2\pi^2}\sqrt{\frac{-\sqrt{y}}{\sin\sqrt{y}}}\frac{e^{-\frac{xy}{2}}}{y}\,\mathrm{d}y.\nonumber \end{align}\]

*Highlights and caveats*. By a reasoning analogous to the one done in the Kolmogorov–Smirnov test, the Cramér–von Mises test can be seen to be**distribution-free if \(F_0\) is continuous**and the**sample has no ties**. Otherwise, (6.6) is*not*the true asymptotic distribution. Although the Kolmogorov–Smirnov test is the most popular ecdf-based test, empirical evidence suggests that the Cramér–von Mises test is often**more powerful than the Kolmogorov–Smirnov test**for a broad class of alternative hypotheses.^{212}*Implementation in R*. For continuous data, the test statistic \(W_n^2\) and the asymptotic \(p\)-value are implemented in the`goftest::cvm.test`

function. The asymptotic cdf (6.6) is given in`goftest::pCvM`

(`goftest::qCvM`

computes its inverse). For discrete \(F_0\), see`dgof::cvm.test`

.

The following chunk of code points to the implementation of the Cramér–von Mises test.

```
# Sample data from a N(0, 1)
set.seed(3245678)
<- 50
n <- rnorm(n = n)
x
# Cramér-von Mises test for H_0: F = N(0, 1). Does not reject
::cvm.test(x = x, null = "pnorm")
goftest##
## Cramer-von Mises test of goodness-of-fit
## Null hypothesis: Normal distribution
## Parameters assumed to be fixed
##
## data: x
## omega2 = 0.022294, p-value = 0.9948
# Comparison with Kolmogorov-Smirnov
ks.test(x = x, y = "pnorm")
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.050298, p-value = 0.9989
## alternative hypothesis: two-sided
# Sample data from a Pois(5)
set.seed(3245678)
<- 100
n <- rpois(n = n, lambda = 5)
x
# Cramér-von Mises test for H_0: F = Pois(5) without specifying that the
# distribution is discrete. Rejects (!?) without giving a warning message
::cvm.test(x = x, null = "ppois", lambda = 5)
goftest##
## Cramer-von Mises test of goodness-of-fit
## Null hypothesis: Poisson distribution
## with parameter lambda = 5
## Parameters assumed to be fixed
##
## data: x
## omega2 = 0.74735, p-value = 0.009631
# We rely on dgof::cvm.test and run a Cramér-von Mises test for H_0: F = Pois(5)
# adapted for discrete data, with data coming from a Pois(5)
<- 0:20
x_eval <- stepfun(x = x_eval, y = c(0, ppois(q = x_eval, lambda = 5)))
ppois_stepfun ::cvm.test(x = x, y = ppois_stepfun)
dgof##
## Cramer-von Mises - W2
##
## data: x
## W2 = 0.038256, p-value = 0.9082
## alternative hypothesis: Two.sided
# Plot the asymptotic null distribution function
curve(goftest::pCvM(x), from = 0, to = 1, n = 300)
```

**Exercise 6.4 **Implement the Cramér–von Mises test by:

- Coding the Cramér–von Mises statistic (6.5) from a sample \(X_1,\ldots,X_n\) and a cdf \(F_0\). Check that the implementation coincides with the one of the
`goftest::cvm.test`

function. - Computing the
*asymptotic*\(p\)-value using`goftest::pCvM`

. Compare this asymptotic \(p\)-value with the*exact*\(p\)-value given by`goftest::cvm.test`

, observing that for a large \(n\) the difference is inappreciable.

**Exercise 6.5 **Verify the correctness of the asymptotic null distribution of \(W_n^2\) that was given in (6.6). To do so, numerically implement \(W_j(x)\) and compute (6.6). Validate your implementation by:

- Simulating \(M=1,000\) samples of size \(n=200\) under \(H_0\), obtaining \(M\) statistics \(W_{n;1}^{2},\ldots, W_{n;M}^{2}\), and plotting their ecdf.
- Overlaying your asymptotic cdf and the one provided by
`goftest::pCvM`

. - Checking that the three curves approximately coincide.

**Anderson–Darling test**

*Test purpose*. Given \(X_1,\ldots,X_n\sim F\), it tests \(H_0: F=F_0\) vs. \(H_1: F\neq F_0\) consistently against all the alternatives in \(H_1\).*Statistic definition*. The test statistic uses a**quadratic distance**between \(F_n\) and \(F_0\)**weighted**by \(w(x)=(F_0(x)(1-F_0(x)))^{-1}\):\[\begin{align*} A_n^2:=n\int\frac{(F_n(x)-F_0(x))^2}{F_0(x)(1-F_0(x))}\,\mathrm{d}F_0(x). \end{align*}\]

If \(H_0\) holds, then \(A_n^2\) tends to be small (because of the denominator). Hence, rejection happens for large values of \(A_n^2\). Note that, compared with \(W_n^2\), \(A_n^2\) places more weight on the deviations between \(F_n(x)\) and \(F_0(x)\) that happen on the tails, that is, when \(F_0(x)\approx 0\) or \(F_0(x)\approx 1\).

^{213}\(\!\!^,\)^{214}*Statistic computation*. The computation of \(A_n^2\) can be significantly simplified:\[\begin{align} A_n^2=-n-\frac{1}{n}\sum_{i=1}^n&\Big\{(2i-1)\log(U_{(i)})\nonumber\\ &+(2n+1-2i)\log(1-U_{(i)})\Big\}.\tag{6.7} \end{align}\]

*Distribution under \(H_0\)*. If \(H_0\) holds and \(F_0\) is**continuous**, then the asymptotic cdf of \(A_n^2\) is the cdf of the random variable\[\begin{align} \sum_{j=1}^\infty\frac{Y_j}{j(j+1)},\quad \text{where } Y_j\sim\chi^2_1,\,j\geq 1,\text{ are iid}.\tag{6.8} \end{align}\]

Unfortunately, the cdf of (6.8) does not admit a simple analytical expression. It can, however, be approximated by Monte Carlo by sampling from the random variable (6.8).

*Highlights and caveats*. As with the previous tests, the Anderson–Darling test is also**distribution-free if \(F_0\) is continuous**and there are**no ties in the sample**. Otherwise, the null asymptotic distribution is different from the one of (6.8). As for the Cramér–von Mises test, there is also empirical evidence indicating that the Anderson–Darling test is**more powerful than the Kolmogorov–Smirnov test**for a broad class of alternative hypotheses. In addition, due to its construction, the Anderson–Darling test is able to detect better the situations in which \(F_0\) and \(F\) differ on the tails (that is, for extreme data).*Implementation in R*. For continuous data, the test statistic \(A_n^2\) and the asymptotic \(p\)-value are implemented in the`goftest::ad.test`

function. The asymptotic cdf of (6.8) is given in`goftest::pAD`

(`goftest::qAD`

computes its inverse). For discrete \(F_0\), see`dgof::cvm.test`

with`type = "A2"`

.

The following code chunk illustrates the implementation of the Anderson–Darling test.

```
# Sample data from a N(0, 1)
set.seed(3245678)
<- 50
n <- rnorm(n = n)
x
# Anderson-Darling test for H_0: F = N(0, 1). Does not reject
::ad.test(x = x, null = "pnorm")
goftest##
## Anderson-Darling test of goodness-of-fit
## Null hypothesis: Normal distribution
## Parameters assumed to be fixed
##
## data: x
## An = 0.18502, p-value = 0.994
# Sample data from a Pois(5)
set.seed(3245678)
<- 100
n <- rpois(n = n, lambda = 5)
x
# Anderson-Darling test for H_0: F = Pois(5) without specifying that the
# distribution is discrete. Rejects (!?) without giving a warning message
::ad.test(x = x, null = "ppois", lambda = 5)
goftest##
## Anderson-Darling test of goodness-of-fit
## Null hypothesis: Poisson distribution
## with parameter lambda = 5
## Parameters assumed to be fixed
##
## data: x
## An = 3.7279, p-value = 0.01191
# We rely on dgof::cvm.test and run an Anderson-Darling test for H_0: F = Pois(5)
# adapted for discrete data, with data coming from a Pois(5)
<- 0:20
x_eval <- stepfun(x = x_eval, y = c(0, ppois(q = x_eval, lambda = 5)))
ppois_stepfun ::cvm.test(x = x, y = ppois_stepfun, type = "A2")
dgof##
## Cramer-von Mises - A2
##
## data: x
## A2 = 0.3128, p-value = 0.9057
## alternative hypothesis: Two.sided
# Plot the asymptotic null distribution function
curve(goftest::pAD(x), from = 0, to = 5, n = 300)
```

**Exercise 6.6 **Implement the Anderson–Darling test by:

- Coding the Anderson–Darling statistic (6.7) from a sample \(X_1,\ldots,X_n\) and a cdf \(F_0\). Check that the implementation coincides with the one of the
`goftest::ad.test`

function. - Computing the
*asymptotic*\(p\)-value using`goftest::pAD`

. Compare this asymptotic \(p\)-value with the*exact*\(p\)-value given by`goftest::ad.test`

, observing that for large \(n\) the difference is inappreciable.

**Exercise 6.7 **Verify the correctness of the asymptotic representation of \(A_n^2\) that was given in (6.8). To do so:

- Simulate \(M=1,000\) samples of size \(n=200\) under \(H_0\), obtain \(M\) statistics \(A_{n;1}^{2},\ldots, A_{n;M}^{2}\), and draw its histogram.
- Simulate \(M\) samples from the random variable (6.8) and draw its histogram.
- Check that the two histograms approximately coincide.

**Exercise 6.8 **Let’s investigate empirically the performance of the Kolmogorov–Smirnov, Cramér–von Mises, and Anderson–Darling tests for a specific scenario. We consider \(H_0:X\sim \mathcal{N}(0,1)\) and the following simulation study:

- Generate \(M=1,000\) samples of size \(n\) from \(\mathcal{N}(\mu,1)\).
- For each sample, test \(H_0\) with the three tests.
- Obtain the relative frequency of rejections at level \(\alpha\) for each test.
- Draw three histograms for the \(p\)-values in the spirit of Figure 6.3.

In order to cleanly perform the previous steps for several choices of \((n,\mu,\alpha)\), code a function that performs the simulation study from those arguments and gives something similar to Figure 6.3 as output. Then, use the following settings and accurately comment on the outcome of each of them:

\(H_0\) holds.

- Take \(n=25\), \(\mu=0\), and \(\alpha=0.05\). Check that the relative frequency of rejections is about \(\alpha\).
- Take \(n=25\), \(\mu=0\), and \(\alpha=0.10\).
- Take \(n=100\), \(\mu=0\), and \(\alpha=0.10\).

\(H_0\) does not hold.

- \(n=25\), \(\mu=0.25\), \(\alpha=0.05\). Check that the relative frequency of rejections is above \(\alpha\).
- \(n=50\), \(\mu=0.25\), \(\alpha=0.05\).
- \(n=25\), \(\mu=0.50\), \(\alpha=0.05\).
- Replace \(\mathcal{N}(\mu,1)\) with \(t_{10}\). Take \(n=50\) and \(\alpha=0.05\). Which test is clearly better? Why?

### 6.1.2 Normality tests

The tests seen in the previous section have a very notable practical limitation: they require complete knowledge of \(F_0\), the hypothesized distribution for \(X\). In practice, such a precise knowledge about \(X\) is unrealistic. Practitioners are more interested in answering more general questions, one of them being:

Is the data “normally distributed?”

With the statement “\(X\) is normally distributed” we refer to the fact that \(X\sim \mathcal{N}(\mu,\sigma^2)\) for *certain* parameters \(\mu\in\mathbb{R}\) and \(\sigma\in\mathbb{R^+}\), possibly unknown. This statement is notably more general than “\(X\) is distributed as a \(\mathcal{N}(0,1)\)” or “\(X\) is distributed as a \(\mathcal{N}(0.1,1.3)\).”

The test for normality is formalized as follows. Given an iid sample \(X_1,\ldots,X_n\) from the distribution \(F\), we test the null hypothesis

\[\begin{align} H_0:F\in\mathcal{F}_{\mathcal{N}}:=\left\{\Phi\left(\frac{\cdot-\mu}{\sigma}\right):\mu\in\mathbb{R},\sigma\in\mathbb{R}^+\right\},\tag{6.9} \end{align}\]

where \(\mathcal{F}_{\mathcal{N}}\) is the *class of normal distributions*, against the most general alternative

\[\begin{align*} H_1:F\not\in\mathcal{F}_{\mathcal{N}}. \end{align*}\]

Comparing (6.9) with (6.1), it is evident that the former is more general, as a *range* of possible values for \(F\) is included in \(H_0\). This is the reason why \(H_0\) is called a ** composite null hypothesis**, rather than a simple null hypothesis, as (6.1) was.

The testing of (6.9) requires the estimation of the unknown parameters \((\mu,\sigma)\). Their maximum likelihood estimators are \((\bar{X},S)\), where \(S:=\sqrt{\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2}\) is the sample standard deviation Therefore, a tempting possibility is to apply the tests seen in Section 6.1.1 to

\[\begin{align*} H_0:F=\Phi\left(\frac{\cdot-\bar{X}}{S}\right). \end{align*}\]

However, as warned at the beginning of Section 6.1, this naive approach would result in **seriously conservative tests** that would fail to reject \(H_0\) adequately. The intuitive explanation is simple: when estimating \((\mu,\sigma)\) from the sample we are obtaining the *closest* normal cdf to \(F\), that is, \(\Phi\left(\frac{\cdot-\bar{X}}{S}\right)\). Therefore, setting \(F_0=\Phi\left(\frac{\cdot-\bar{X}}{S}\right)\) and then running one of the ecdf-based tests on \(H_0:F=F_0\) will ignore this **estimation step**, which will **bias the test decision towards \(H_0\)**.

Adequately accounting for the estimation of \((\mu,\sigma)\) amounts to study the asymptotic distributions under \(H_0\) of the test statistics that are computed with the *data-dependent* cdf \(\Phi\left(\frac{\cdot-\bar{X}}{S}\right)\) in the place of \(F_0\). This is what is precisely done by the tests that adapt the ecdf-based tests seen in Section 6.1.1 by

- replacing \(F_0\) with \(\Phi\left(\frac{\cdot-\bar{X}}{S}\right)\) in the formulation of the test statistics;
- replacing \(U_i\) with \(P_i:=\Phi\left(\frac{X_i-\bar{X}}{\hat{S}}\right)\), \(i=1,\ldots,n\), in the computational forms of the statistics;
^{215} - obtaining a
*different*null asymptotic distribution that is usually more convoluted and is approximated by tabulation or simulation.

**Lilliefors test (Kolmogorov–Smirnov test for normality)**

*Test purpose*. Given \(X_1,\ldots,X_n\), it tests \(H_0: F\in\mathcal{F}_\mathcal{N}\) vs. \(H_1: F\not\in\mathcal{F}_\mathcal{N}\) consistently against all the alternatives in \(H_1\).*Statistic definition and computation*. The test statistic uses the supremum distance between \(F_n\) and \(\Phi\left(\frac{\cdot-\bar{X}}{S}\right)\):\[\begin{align*} D_n=&\,\sqrt{n}\sup_{x\in\mathbb{R}} \left|F_n(x)-\Phi\left(\frac{x-\bar{X}}{S}\right)\right|=\max(D_n^+,D_n^-),\\ D_n^+=&\,\sqrt{n}\max_{1\leq i\leq n}\left\{\frac{i}{n}-P_{(i)}\right\},\\ D_n^-=&\,\sqrt{n}\max_{1\leq i\leq n}\left\{P_{(i)}-\frac{i-1}{n}\right\}, \end{align*}\]

where \(P_{(j)}\) stands for the \(j\)-th sorted \(P_i=\Phi\left(\frac{X_i-\bar{X}}{S}\right)\), \(i=1,\ldots,n\). Clearly, if \(H_0\) holds, then \(D_n\) tends to be small. Hence, rejection happens when \(D_n\) is large.

*Distribution under \(H_0\)*. If \(H_0\) holds, the critical values at level of significance \(\alpha\) can be obtained from Lilliefors (1967).^{216}Dallal and Wilkinson (1986) provides an analytical approximation to the null distribution.*Highlights and caveats*. The Lilliefors test is*not*distribution-free, in the sense that the null distribution of the test statistic clearly depends on the*normality*assumption. However, it is*parameter-free*, since the distribution does not depend on the actual parameters \((\mu,\sigma)\) for which \(F=\Phi\left(\frac{\cdot-\mu}{\sigma}\right)\) is satisfied. This result comes from the iid sample \(X_1,\ldots,X_n\stackrel{H_0}{\sim} \mathcal{N}(\mu,\sigma^2)\) generating the id sample \(\frac{X_1-\bar{X}}{S},\ldots,\frac{X_n-\bar{X}}{S}\stackrel{H_0}{\sim} \tilde{f}_0\), where \(\tilde{f}_0\) is a certain pdf that does not depend on \((\mu,\sigma)\).^{217}Therefore, \(P_1,\ldots,P_n\) is an id sample that does not depend on \((\mu,\sigma)\).*Implementation in R*. The`nortest::lillie.test`

function implements the test.

**Cramér–von Mises normality test**

*Test purpose*. Given \(X_1,\ldots,X_n\), it tests \(H_0: F\in\mathcal{F}_\mathcal{N}\) vs. \(H_1: F\not\in\mathcal{F}_\mathcal{N}\) consistently against all the alternatives in \(H_1\).*Statistic definition and computation*. The test statistic uses the quadratic distance between \(F_n\) and \(\Phi\left(\frac{\cdot-\bar{X}}{S}\right)\):\[\begin{align*} W_n^2=&\,n\int\left(F_n(x)-\Phi\left(\frac{x-\bar{X}}{S}\right)\right)^2\,\mathrm{d}\Phi\left(\frac{x-\bar{X}}{S}\right)\\ =&\,\sum_{i=1}^n\left\{P_{(i)}-\frac{2i-1}{2n}\right\}^2+\frac{1}{12n}, \end{align*}\]

where \(P_{(j)}\) stands for the \(j\)-th sorted \(P_i=\Phi\left(\frac{X_i-\bar{X}}{S}\right)\), \(i=1,\ldots,n\). Rejection happens when \(W_n^2\) is large.

*Distribution under \(H_0\)*. If \(H_0\) holds, the \(p\)-values of the test can be approximated using Table 4.9 in D’Agostino and Stephens (1986).*Highlights and caveats*. Like the Lilliefors test, the Cramér–von Mises test is also*parameter-free*. Its usual power superiority over the Kolmogorov–Smirnov also extends to the testing of normality.*Implementation in R*. The`nortest::cvm.test`

function implements the test.

**Anderson–Darling normality test**

*Test purpose*. Given \(X_1,\ldots,X_n\), it tests \(H_0: F\in\mathcal{F}_\mathcal{N}\) vs. \(H_1: F\not\in\mathcal{F}_\mathcal{N}\) consistently against all the alternatives in \(H_1\).*Statistic definition and computation*. The test statistic uses a weighted quadratic distance between \(F_n\) and \(\Phi\left(\frac{\cdot-\bar{X}}{S}\right)\):\[\begin{align*} A_n^2=&\,n\int\frac{\left(F_n(x)-\Phi\left(\frac{x-\bar{X}}{S}\right)\right)^2}{\Phi\left(\frac{x-\bar{X}}{S}\right)\left(1-\Phi\left(\frac{x-\bar{X}}{S}\right)\right)}\,\mathrm{d}\Phi\left(\frac{x-\bar{X}}{S}\right)\\ =&\,-n-\frac{1}{n}\sum_{i=1}^n\left\{(2i-1)\log(P_{(i)})+(2n+1-2i)\log(1-P_{(i)})\right\}. \end{align*}\]

Rejection happens when \(A_n^2\) is large.

*Distribution under \(H_0\)*. If \(H_0\) holds, the \(p\)-values of the test can be approximated using Table 4.9 in D’Agostino and Stephens (1986).*Highlights and caveats*. The test is also parameter-free. Since the test statistic places more weight on the tails than the Cramér–von Mises, it is better suited for detecting heavy-tailed deviations from normality.*Implementation in R*. The`nortest::ad.test`

function implements the test.

The following code chunk gives the implementation of these normality tests.

```
# Sample data from a N(10, 1)
set.seed(123456)
<- 100
n <- rnorm(n = n, mean = 10)
x
# Normality tests -- do not reject H0
::lillie.test(x = x)
nortest##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: x
## D = 0.054535, p-value = 0.6575
::cvm.test(x = x)
nortest##
## Cramer-von Mises normality test
##
## data: x
## W = 0.032898, p-value = 0.8027
::ad.test(x = x)
nortest##
## Anderson-Darling normality test
##
## data: x
## A = 0.22888, p-value = 0.8052
# Sample data from a Student's t with 3 degrees of freedom
<- rt(n = n, df = 3)
x
# Normality tests -- reject H0
::lillie.test(x = x)
nortest##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: x
## D = 0.08453, p-value = 0.07499
::cvm.test(x = x)
nortest##
## Cramer-von Mises normality test
##
## data: x
## W = 0.19551, p-value = 0.005965
::ad.test(x = x)
nortest##
## Anderson-Darling normality test
##
## data: x
## A = 1.2842, p-value = 0.002326
# Flawed normality tests -- do not reject because the null distribution
# that is employed is wrong!
ks.test(x = x, y = "pnorm", mean = mean(x), sd = sd(x))
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.08453, p-value = 0.4725
## alternative hypothesis: two-sided
::cvm.test(x = x, null = "pnorm", mean = mean(x), sd = sd(x))
goftest##
## Cramer-von Mises test of goodness-of-fit
## Null hypothesis: Normal distribution
## with parameters mean = 0.0151268546198777, sd = 1.33709561396292
## Parameters assumed to be fixed
##
## data: x
## omega2 = 0.19551, p-value = 0.2766
::ad.test(x = x, null = "pnorm", mean = mean(x), sd = sd(x))
goftest##
## Anderson-Darling test of goodness-of-fit
## Null hypothesis: Normal distribution
## with parameters mean = 0.0151268546198777, sd = 1.33709561396292
## Parameters assumed to be fixed
##
## data: x
## An = 1.2842, p-value = 0.2375
```

**Shapiro–Francia normality test**

We now consider a different normality test that is not based on the ecdf, the Shapiro–Francia normality test. This test is a modification of the wildly popular Shapiro–Wilk test, but the Shapiro–Francia test is easier to interpret and explain.^{218}

The test is based on the **QQ-plot** of the sample \(X_1,\ldots,X_n\). The plot is comprised of the pairs \(\left(\Phi^{-1}\left(\frac{i}{n+1}\right),X_{(i)}\right)\,\), for \(i=1,\ldots,n\).^{219} That is, the QQ-plot plots **\(\mathcal{N}(0,1)\)-quantiles against sample quantiles**.

Since the \(\alpha\)-*lower* quantile of a \(\mathcal{N}(\mu,\sigma^2)\), \(z_{\alpha;\mu,\sigma}\), verifies that

\[\begin{align} z_{\alpha;\mu,\sigma}=\mu+\sigma z_{\alpha;0,1},\tag{6.10} \end{align}\]

then, if the sample is distributed as a \(\mathcal{N}(\mu,\sigma^2)\), the points of the QQ-plot are expected to align about the straight line (6.10). This is illustrated in the code below.

```
<- 100
n <- 10
mu <- 2
sigma set.seed(12345678)
<- rnorm(n, mean = mu, sd = sigma)
x qqnorm(x)
abline(a = mu, b = sigma, col = 2)
```

The QQ-plot can therefore be used as a graphical check for normality: if the data is distributed about *some* straight line, then it likely is normally distributed. However, this graphical check is subjective and, to complicate things further, it is usual for departures from the diagonal to be larger in the extremes than in the center, even under normality, although these departures are more evident if the data is non-normal.^{220} Figure 6.5, generated with the code below, depicts the uncertainty behind the QQ-plot.

```
<- 1e3
M <- 100
n plot(0, 0, xlim = c(-3.5, 3.5), ylim = c(-3.5, 3.5), type = "n",
xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
main = "Confidence bands for QQ-plot")
<- matrix(rnorm(M * n), nrow = n, ncol = M)
x matpoints(qnorm(ppoints(n)), apply(x, 2, sort), pch = 19, cex = 0.5,
col = gray(0, alpha = 0.01))
abline(a = 0, b = 1)
<- seq(0, 1, l = 1e4)
p <- qnorm(p)
xi lines(xi, xi - qnorm(0.975) / sqrt(n) * sqrt(p * (1 - p)) / dnorm(xi),
col = 2, lwd = 2)
lines(xi, xi + qnorm(0.975) / sqrt(n) * sqrt(p * (1 - p)) / dnorm(xi),
col = 2, lwd = 2)
```

The Shapiro–Francia test evaluates how tightly distributed the QQ plot is about the linear trend that must arise if the data is normally distributed.

*Test purpose*. Given \(X_1,\ldots,X_n\), it tests \(H_0: F\in\mathcal{F}_\mathcal{N}\) vs. \(H_1: F\not\in\mathcal{F}_\mathcal{N}\) consistently against all the alternatives in \(H_1\).*Statistic definition and computation*. The test statistic, \(W_n'\), is simply the**squared correlation coefficient**for the sample^{221}\[\begin{align*} \left(\Phi^{-1}\left(\frac{i - 3/8}{n-1/4}\right),X_{(i)}\right),\quad i=1,\ldots,n. \end{align*}\]

*Distribution under \(H_0\)*. Royston (1993) derived a simple analytical formula that serves to approximate the null distribution of \(W_n'\) for \(5\leq n\leq 5,000\).*Highlights and caveats*. The test is also*parameter-free*, since its distribution does not depend on the actual parameters \((\mu,\sigma)\) for which the equality \(H_0\) holds.*Implementation in R*. The`nortest::sf.test`

function implements the test. The condition \(5\leq n\leq 5,000\) is enforced by the function.^{222}

The use of the function is described below.

```
# Does not reject H0
set.seed(123456)
<- 100
n <- rnorm(n = n, mean = 10)
x ::sf.test(x)
nortest##
## Shapiro-Francia normality test
##
## data: x
## W = 0.99336, p-value = 0.8401
# Rejects H0
<- rt(n = n, df = 3)
x ::sf.test(x)
nortest##
## Shapiro-Francia normality test
##
## data: x
## W = 0.91545, p-value = 2.754e-05
# Test statistic
cor(x = sort(x), y = qnorm(ppoints(n, a = 3/8)))^2
## [1] 0.9154466
```

### 6.1.3 Bootstrap-based approach to goodness-of-fit testing

Each normality test discussed in the previous section is an instance of a **goodness-of-fit test for a parametric distribution model**. Precisely, the normality tests are goodness-of-fit tests for the normal distribution model. Therefore, they address a specific instance of the test of the null *composite* hypothesis

\[\begin{align*} H_0:F\in\mathcal{F}_{\Theta}:=\left\{F_\theta:\theta\in\Theta\subset\mathbb{R}^q\right\}, \end{align*}\]

where \(\mathcal{F}_{\Theta}\) denotes a certain *parametric* family of distributions indexed by the (possibly multidimensional) parameter \(\theta\in\Theta\), versus the most general alternative

\[\begin{align*} H_1:F\not\in\mathcal{F}_{\Theta}. \end{align*}\]

The normality case demonstrated that the goodness-of-fit tests for the *simple* hypothesis were not readily applicable, and that the null distributions were affected by the estimation of the unknown parameters \(\theta=(\mu,\sigma)\). In addition, these null distributions tend to be cumbersome, requiring **analytical approximations** that have to be done on a **test-by-test basis** in order to obtain computable \(p\)-values.

To make things worse, the **derivations** that were done to obtain the asymptotic distributions of the normality test are **not reusable if \(\mathcal{F}_\Theta\) is different**. For example, if we wanted to test *exponentiality*, that is,

\[\begin{align} H_0:F\in\mathcal{F}_{\mathbb{R}^+}=\left\{F_\theta(x)=(1-e^{-\theta x})1_{\{x>0\}}:\theta\in\mathbb{R}^+\right\},\tag{6.11} \end{align}\]

then new asymptotic distributions with their corresponding new analytical approximations would need to be derived. Clearly, this is not a very practical approach if we are to evaluate the goodness-of-fit of several parametric models. A practical computational-based solution is to rely on **bootstrap**, specifically, on *parametric bootstrap*.

Since the main problem is to establish the distribution of the test statistic under \(H_0\), then a possibility is to approximate this distribution by the distribution of the *bootstraped statistic*. Precisely, let \(T_n\) be the statistic computed from the sample

\[\begin{align*} X_1,\ldots,X_n\stackrel{H_0}{\sim}F_\theta \end{align*}\]

and let \(T_n^*\) be its bootstrapped version, that is, the statistic computed from the simulated sample^{223}

\[\begin{align} X^*_1,\ldots,X^*_n \sim F_{\hat\theta}.\tag{6.12} \end{align}\]

Then, **if \(H_0\) holds, the distribution of \(T_n\) is approximated ^{224} by the one of \(T_n^*\)**. In addition, since the sampling in (6.12) is completely under our control, we can

**approximate arbitrarily well the distribution of \(T_n^*\) by Monte Carlo**. For example, the computation of the upper tail probability \(\mathbb{P}^*[T_n^*>x]\), required to obtain \(p\)-values in all the tests we have seen, can be done by

\[\begin{align*} \mathbb{P}^*[T_n^* > x]\approx\frac{1}{B}\sum_{b=1}^B 1_{\{T_n^{*b}> x\}}, \end{align*}\]

where \(T_n^{*1},\ldots,T_n^{*B}\) is a sample from \(T_n^*\) obtained by simulating \(B\) bootstrap samples from (6.12).

The whole approach can be summarized in the following **bootstrap-based procedure** for performing a goodness-of-fit test for a parametric distribution model:

Estimate \(\theta\) from the sample \(X_1,\ldots,X_n\), obtaining \(\hat\theta\) (for example, use maximum likelihood).

Compute the statistic \(T_n\) from \(X_1,\ldots,X_n\) and \(\hat\theta\).

Enter the “bootstrap world.” For \(b=1,\ldots,B\):

- Simulate a boostrap sample \(X_1^{*b},\ldots,X_n^{*b}\) from \(F_{\hat\theta}\).
- Compute \(\hat\theta^{*b}\) from \(X_1^{*b},\ldots,X_n^{*b}\) exactly in the same form that \(\theta\) was computed from \(X_1,\ldots,X_n\).
- Compute \(T_n^{*b}\) from \(X_1^{*b},\ldots,X_n^{*b}\) and \(\hat\theta^{*b}\).

Obtain the \(p\)-value approximation

\[\begin{align*} p\text{-value}\approx\frac{1}{B}\sum_{b=1}^B 1_{\{T_n^{*b}> T_n\}} \end{align*}\]

and emit a test decision from it. Modify it accordingly if rejection of \(H_0\) does not happen for large values of \(T_n\).

The following chunk of code provides a template function for implementing the previous algorithm. The template is initialized with the specifics for testing (6.11), for which \(\hat\theta_{\mathrm{ML}}=1/{\bar X}\). The function uses the `boot::boot`

function for carrying out the parametric bootstrap.

```
# A goodness-of-fit test of the exponential distribution using the
# Cramér-von Mises statistic
<- function(x, B = 5e3, plot_boot = TRUE) {
cvm_exp_gof
# Test statistic function (depends on the data only)
<- function(data) {
Tn
# Maximum likelihood estimator -- TO MODIFY
<- 1 / mean(data)
theta_hat
# Test statistic -- TO MODIFY
::cvm.test(x = data, null = "pexp", rate = theta_hat)$statistic
goftest
}
# Function to simulate bootstrap samples X_1^*, ..., X_n^*. Requires TWO
# arguments, one being the data X_1, ..., X_n and the other containing
# the parameter theta
<- function(data, theta) {
r_mod
# Simulate from an exponential. In this case, the function only uses
# the sample size from the data argument to estimate theta -- TO MODIFY
rexp(n = length(data), rate = 1 / theta)
}
# Estimate of theta -- TO MODIFY
<- 1 / mean(x)
theta_hat
# Perform bootstrap resampling with the aid of boot::boot
<- boot::boot(data = x, statistic = Tn, sim = "parametric",
Tn_star ran.gen = r_mod, mle = theta_hat, R = B)
# Test information -- TO MODIFY
<- "Bootstrap-based Cramér-von Mises test for exponentiality"
method <- "any alternative to exponentiality"
alternative
# p-value: modify if rejection does not happen for large values of the
# test statistic
<- mean(Tn_star$t > Tn_star$t0)
pvalue
# Construct an "htest" result
<- list(statistic = Tn_star$t0, p.value = pvalue,
result theta_hat = theta_hat, statistic_boot = drop(Tn_star$t),
B = B, alternative = alternative, method = method,
data.name = deparse(substitute(x)))
class(result) <- "htest"
# Plot the position of the original statistic with respect to the
# bootstrap replicates?
if (plot_boot) {
hist(result$statistic_boot, probability = TRUE,
main = paste("p-value:", result$p.value),
xlab = latex2exp::TeX("$T_n^*$"))
rug(result$statistic_boot)
abline(v = result$statistic, col = 2)
}
# Return "htest"
return(result)
}
# Check the test for H0 true
set.seed(123456)
<- rgamma(n = 100, shape = 1, scale = 1)
x <- cvm_exp_gof(x = x, B = 1e3) gof0
```

```
gof0##
## Bootstrap-based Cramér-von Mises test for exponentiality
##
## data: x
## omega2 = 0.022601, p-value = 0.979
## alternative hypothesis: any alternative to exponentiality
# Check the test for H0 false
<- rgamma(n = 100, shape = 2, scale = 1)
x <- cvm_exp_gof(x = x, B = 1e3) gof1
```

```
gof1##
## Bootstrap-based Cramér-von Mises test for exponentiality
##
## data: x
## omega2 = 0.49536, p-value = 0.001
## alternative hypothesis: any alternative to exponentiality
```

Another example is given below. It adapts the previous template for the flexible class of mixtures of normal distributions.

```
# A goodness-of-fit test of a mixture of m normals using the
# Cramér-von Mises statistic
<- function(x, m, B = 1e3, plot_boot = TRUE) {
cvm_nm_gof
# Test statistic function (depends on the data only)
<- function(data) {
Tn
# EM algorithm for fitting normal mixtures. With trace = 0 we disable the
# default convergence messages or otherwise they will saturate the screen
# the bootstrap loop. Be aware that this is a potentially dangerous
# practice, as we may lose important information about the convergence of
# the EM algorithm
<- nor1mix::norMixEM(x = data, m = m, trace = 0)
theta_hat
# Test statistic
::cvm.test(x = data, null = nor1mix::pnorMix,
goftestobj = theta_hat)$statistic
}
# Function to simulate bootstrap samples X_1^*, ..., X_n^*. Requires TWO
# arguments, one being the data X_1, ..., X_n and the other containing
# the parameter theta
<- function(data, theta) {
r_mod
::rnorMix(n = length(data), obj = theta)
nor1mix
}
# Estimate of theta
<- nor1mix::norMixEM(x = x, m = m, trace = 0)
theta_hat
# Perform bootstrap resampling with the aid of boot::boot
<- boot::boot(data = x, statistic = Tn, sim = "parametric",
Tn_star ran.gen = r_mod, mle = theta_hat, R = B)
# Test information
<- "Bootstrap-based Cramér-von Mises test for normal mixtures"
method <- paste("any alternative to a", m, "normal mixture")
alternative
# p-value: modify if rejection does not happen for large values of the
# test statistic
<- mean(Tn_star$t > Tn_star$t0)
pvalue
# Construct an "htest" result
<- list(statistic = Tn_star$t0, p.value = pvalue,
result theta_hat = theta_hat, statistic_boot = drop(Tn_star$t),
B = B, alternative = alternative, method = method,
data.name = deparse(substitute(x)))
class(result) <- "htest"
# Plot the position of the original statistic with respect to the
# bootstrap replicates?
if (plot_boot) {
hist(result$statistic_boot, probability = TRUE,
main = paste("p-value:", result$p.value),
xlab = latex2exp::TeX("$T_n^*$"))
rug(result$statistic_boot)
abline(v = result$statistic, col = 2)
}
# Return "htest"
return(result)
}
# Check the test for H0 true
set.seed(123456)
<- c(rnorm(n = 100, mean = 2, sd = 0.5), rnorm(n = 100, mean = -2))
x <- cvm_nm_gof(x = x, m = 2, B = 1e3) gof0
```

```
gof0##
## Bootstrap-based Cramér-von Mises test for normal mixtures
##
## data: x
## omega2 = 0.018769, p-value = 0.806
## alternative hypothesis: any alternative to a 2 normal mixture
# Graphical assessment that H0 (parametric fit) and data (kde) match
plot(gof0$theta_hat, p.norm = FALSE, ylim = c(0, 0.5))
plot(ks::kde(x), col = 2, add = TRUE)
```

```
# Check the test for H0 false
<- rgamma(n = 100, shape = 2, scale = 1)
x <- cvm_nm_gof(x = x, m = 1, B = 1e3) gof1
```

```
gof1##
## Bootstrap-based Cramér-von Mises test for normal mixtures
##
## data: x
## omega2 = 0.44954, p-value < 2.2e-16
## alternative hypothesis: any alternative to a 1 normal mixture
# Graphical assessment that H0 (parametric fit) and data (kde) do not match
plot(gof1$theta_hat, p.norm = FALSE, ylim = c(0, 0.5))
plot(ks::kde(x), col = 2, add = TRUE)
```

**Exercise 6.9 **Adapt the previous template to perform the following goodness-of-fit tests:

- Test the goodness-of-fit of the Poisson distribution using the Kolmogorov–Smirnov test. Check by simulation that the test is correctly implemented (such that the significance level \(\alpha\) is respected if \(H_0\) holds).
- Test the goodness-of-fit of the beta distribution using the Anderson–Darling test. Check by simulation that the test is correctly implemented.
- Test the goodness-of-fit of the lognormal distribution using the Cramér–von Mises test. Check by simulation that the test is correctly implemented. Apply this test to inspect if temps-other.txt is distributed according to a lognormal.

### References

*Lectures on Empirical Processes*, 1–92. EMS Series of Lectures in Mathematics. Zürich: European Mathematical Society. https://doi.org/10.4171/027.

*Goodness-of-Fit Techniques*. Vol. 68. Statistics: Textbooks and Monographs. New York: Marcel Dekker. https://www.routledge.com/Goodness-of-Fit-Techniques/DAgostino/p/book/9780367580346.

*The American Statistician*40 (4): 294–96. https://doi.org/10.1080/00031305.1986.10475419.

*Journal of the American Statistical Association*62 (318): 399–402. https://doi.org/10.1080/01621459.1967.10482916.

*Statistics in Medicine*12 (2): 181–84. https://doi.org/10.1002/sim.4780120209.

*Mathematical Statistics*. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/b97553.

*Journal of the American Statistical Association*69 (347): 730–37. https://doi.org/10.2307/2286009.

\(F\) does not need to be continuous.↩︎

The fact that \(F=F_0\) is tested in full generality against

*any*distribution that is different from \(F_0\) raises the*omnibus test*(all-purpose test) terminology for the kind of tests that are able to detect all the alternatives to \(H_0\) that are within \(H_1\).↩︎That is, the tests will have a rejection probability lower than \(\alpha\) when \(H_0\) is true.↩︎

If \(F\) and \(F_0\) were continuous, the density approach to this problem would compare the kde \(\hat{f}(\cdot;h)\) seen in Chapter 2 with the density \(f_0\). While using a kde for testing \(H_0:F=F_0\) via \(H_0:f=f_0\) is a perfectly valid approach and has certain advantages (e.g., better detection of local violations of \(H_0\)), it is also more demanding (it requires selecting the bandwidth \(h\)) and limited (applicable to continuous random variables only). The cdf optic, although not appropriate for visualization or for the applications described in Section 3.5, suffices for conducting hypothesis testing in a nonparametric way.↩︎

A popular approach to goodness-of-fit tests, not covered in these notes, is the chi-squared test (see

`?chisq.test`

). This test is based on aggregating the values of \(X\) in \(k\) classes \(I_i\), \(i=1,\ldots,k\), where \(\cup_{i=1}^k I_i\) cover the support of \(X\). Worryingly, one can modify the outcome of the test by altering the form of these classes and the choice of \(k\) – a sort of tuning parameter. The choice of \(k\) also affects the quality of the asymptotic null distribution. Therefore, for continuous and discrete random variables, the chi-squared test has significant drawbacks when compared to the ecdf-based tests. Nevertheless, the “chi-squared philosophy” of testing is remarkable general: as opposed to ecdf-based tests, it can be readily applied to the analysis of categorical variables and contingency tables.↩︎Or the \(\infty\)-distance \(\|F_n-F_0\|_\infty=\sup_{x\in\mathbb{R}} \left|F_n(x)-F_0(x)\right|\).↩︎

With the addition of the \(\sqrt{n}\) factor in order to standardize \(D_n\) in terms of its asymptotic distribution under \(H_0\). Many authors do not consider this factor as a part of the test statistic itself.↩︎

Notice that \(F_n(X_{(i)})=\frac{i}{n}\) and \(F_n(X_{(i)}^-)=\frac{i-1}{n}\).↩︎

When the sample size \(n\) is large: \(n\to\infty\).↩︎

The infinite series converges quite fast and \(50\) terms are usually enough to evaluate (6.3) with high accuracy for \(x\in\mathbb{R}\).↩︎

A tie occurs when two elements of the sample are numerically equal, an event with probability zero if the sample comes from a truly continuous random variable.↩︎

More precisely, the pdf of \(D_n\) would follow from the joint pdf of the sorted sample \((U_{(1)},\ldots,U_{(n)})\) that is generated from a \(\mathcal{U}(0,1)\) and for which the transformation (6.2) is employed. The exact analytical formula, for a given \(n\), is very cumbersome, hence the need for an asymptotic approximation.↩︎

That is, if \(F_0\) is the cdf of a \(\mathcal{N}(0,1)\) or the cdf of a \(\mathrm{Exp}(\lambda)\), the distribution of \(D_n\) under \(H_0\) is exactly the same.↩︎

Ties may appear as a consequence of a measuring process of a continuous quantity having low precision.↩︎

Which is \(\lim_{n\to\infty}\mathbb{P}[d_n>D_n]=1-K(d_n)\), where \(d_n\) is the observed statistic and \(D_n\) is the random variable (6.2).↩︎

Recall that the \(p\)-value of any statistical test is a random variable, since it depends on the sample.↩︎

Recall that \(\mathrm{d}F_0(x)=f_0(x)\,\mathrm{d}x\) if \(F_0\) is continuous.↩︎

See, e.g., Section 5 in Stephens (1974) or page 110 in D’Agostino and Stephens (1986). Observe this is only empirical evidence for certain scenarios.↩︎

The motivation for considering \(w(x)=(F_0(x)(1-F_0(x)))^{-1}\) stems from recalling that \(\mathbb{E}[(F_n(x)-F_0(x))^2]\stackrel{H_0}{=}F_0(x)(1-F_0(x))/n\) (see Example 1.7). That is, estimating \(F_0\) on the tails is less variable than at its center. \(A_n^2\) weights \((F_n(x)-F_0(x))^2\) with its variability by incorporating \(w\).↩︎

Actually, due to the allocation of greater weight to the tails, the statistic \(A_n^2\) is somehow on the edge of existence. If \(F_0\) is continuous, the weight function \(w(x)=(F_0(x)(1-F_0(x)))^{-1}\) is

*not integrable*on \(\mathbb{R}\). Indeed, \(\int w(x)\,\mathrm{d}F_0(x)=\int_0^1 (x(1-x))^{-1}\,\mathrm{d}x=\nexists\). It is only thanks to the factor \((F_n(x)-F_0(x))^2\) in the integrand that \(A_n^2\) is well defined.↩︎Importantly, recall that \(P_1,\ldots,P_n\) are

*not*independent, since \(\bar{X}\) and \(\hat{S}\) depend on the whole sample \(X_1,\ldots,X_n\). To dramatize this point, imagine that \(X_1\) is very large. Then, it would drive \(\frac{X_i-\bar{X}}{\hat{S}}\) towards large negative values, revealing the dependence of \(P_2,\ldots,P_n\) on \(P_1\). This is a very remarkable qualitative difference with respect to \(U_1,\ldots,U_n\), which was an iid sample. \(P_1,\ldots,P_n\) is an id sample, but not an iid sample.↩︎\(H_0\) is rejected with significance level \(\alpha\) if the statistic is above the critical value for \(\alpha\).↩︎

The precise form of \(\tilde{f}_0\) (if \(S\) stands for the

*quasi*-variance) can be seen in equation (3.1) in Shao (1999).↩︎The Shapiro–Wilk test is implemented in the

`shapiro.test`

R function. The test has the same highlights and caveats as`nortest::sf.test`

.↩︎Minor modifications of the \(\frac{i}{n+1}\)-

*lower*quantile \(\Phi^{-1}\left(\frac{i}{n+1}\right)\) may be considered.↩︎For \(X\sim F\), the \(p\)-th quantile \(x_p=F^{-1}(p)\) of \(X\) is estimated through the sample quantile \(\hat{x}_p:=F_n^{-1}(p)\). If \(X\sim f\) is continuous, then \(\sqrt{n}(\hat{x}_p-x_p)\) is asymptotically \(\mathcal{N}\left(0,\frac{p(1-p)}{f(x_p)^2}\right)\). Therefore, the variance of \(\hat x_p\) grows if \(p\to0,1\) and more variability is expected in the extremes of the QQ-plot.↩︎

Observe that

`qnorm(ppoints(x, a = 3 / 8))`

is used as an approximation to the expected ordered quantiles from a \(\mathcal{N}(0,1)\), whereas`qnorm(ppoints(x, a = 1 / 2))`

is employed internally in`qqplot`

.↩︎However, the R code of the function allows computing the statistic and carrying out a bootstrap-based test as described in Section 6.1.3 for \(n\geq5,000\). The

`shapiro.test`

function, on the contrary, does not allow this simple extraction, as the core of the function is written in C (function`C_SWilk`

) and internally enforces the condition \(5\leq n\leq 5,000\).↩︎Notice that \(X^*_i\) is

*always*distributed as \(F_{\hat\theta}\), no matter whether \(H_0\) holds or not.↩︎With increasing precision as \(n\to\infty\).↩︎