2.1 Variance and Standard Deviation

The variance is a measure that tells us how spread out a given distribution of data is. The sample variance is normally denoted by \(s^2\) and defined by the following formula:

\[s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i-\overline{x})^2,\]

where \(n\), \(i\), \(x_i\) and \(\overline{x}\) are as defined earlier. If we look at this formula, we can see that it is looking at the difference between of all of the values and the sample mean. It is then estimating the average of the square of all of these differences. What this means is, if lots of values are far away from the mean, the variance will be high. On the other hand, if most of the values are very close to the mean, the variance will be low.

Comparing our two histograms again, we have that the variance of the data in Histogram A is 907.947, compared with the variance of the data in Histogram B, which is 106.5411.

The standard deviation is simply the square root of the variance, and the sample standard deviation is usually denoted \(s\). That is, we have that

\[s = \sqrt{s^2} = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\overline{x})^2}\] so that the standard deviations of the data in Histograms A and B are 30.13216 and 10.32187 respectively. As expected, the variance and standard deviation of the data in Histogram B were both much lower than that for Histogram A.

Although the variance and standard deviation are very much related, the standard deviation is usually easier to interpret because it is expressed in the same units as the data at hand, whereas the variance is expressed in the units squared. For example, if the data in Histogram A represented height in cm, then the associated standard deviation and variance could be expressed as 30.13216 cm and 907.947 cm². respectively. In the context of heights in cm, 30.13216 cm is of course much easier to interpret.

The population variance is usually denoted \(\sigma^2\) and the population standard deviation is usually denoted \(\sigma.\) Usually, we do not know the true values of \(\sigma^2\) and \(\sigma\), but we can use the sample variance and sample standard deviation, \(s^2\) and \(s\) respectively, to try and estimate them.

Calculating these measures of spread is more complicated than calculating, for example, the mean or median. Thankfully, we will be using statistical software packages to calculate measures of spread in this subject. However, you will be expected to know how to convert a given variance into a standard deviation, and vice-versa. You can have a go now:

Your turn:

Suppose a sample standard deviation was calculated to be 8. Then what would be the associated variance?
Suppose a sample variance was calculated to be 144. Then what would be the associated standard deviation?

Remember that the standard deviation is the square root of the variance. This means that the variance is the standard deviation squared.