Chapter 1 The Chi-squared distribution

Before we go too much further, we will look at the sampling distribution used for chi-squared tests: the chi-squared distribution. "Chi" is a Greek letter, \(\chi\), and is pronounced, "ky".

The chi-squared distribution is defined by the degrees of freedom (df). So, supposing a random variable \(X^2\) follows a chi-squared distribution, we would write this as \(X^2 \sim \chi^2_{\text{df}}\). If, for example, we had df = 5, we would write \(X^2 \sim \chi^2_5\). The below figure shows some example density curves of the \(\chi^2\) distribution for varying degrees of freedom:

As we can see, the chi-squared distribution is positively skewed, however as the degrees of freedom increases, the density curve begins to look flatter and more like a density that resembles the normal distribution. The chi-squared distribution only takes on positive values.

When we carry out a chi-squared test, the observed test statistic, \(\chi^2\) is placed within the context of the corresponding sampling distribution and we calculate the \(p\)-value as \(p = P(X^2 \geq \chi^2)\). This means that a large test statistic will result in a small \(p\)-value (and subsequently a significant result), whereas a small test statistic will result in a large \(p\)-value (and subsequently a non-significant result).