Bringing it all together, a very important question we might now ask is, when should we use which measure?
An important consideration here is that the mean, variance, and standard deviation include every value within a given data set. This means they are very comprehensive measures, but they are easily affected by skewed data or extreme values (outliers). On the other hand, the median and IQR only consider the ranks of the values in a given data set. This means they are less comprehensive, but they are more robust when data are skewed or in the presence of outliers. So, a basic guideline is as follows:
Which measure should I use?
Symmetric data: Use mean, standard deviation, variance
Skewed data, or data with extreme values: Use median, IQR
This concept will be further demonstrated through the following exercise.
Consider the following set of numbers: 2, 3, 4, 5, 6
- What is the mean?
- What is the median?
- Now suppose that the number 6 was incorrect above, and should have in fact been 26. So we now have the following set of numbers: 2, 3, 4, 5, 26
- What is the mean now?
- What is the median now?
Hopefully you would have seen through this example just how affected the mean can be by just one extreme value (the mean doubled after changing just one value in the sample!). On the other hand, we can see that the median is much less affected (if at all) and, in this situation, was better able to tell us what a 'typical' value was.