5 Statistical Dispersion
While Central Tendency identifies the “middle” of a dataset, measures of dispersion describe how widely the values are spread around that center. In other words, dispersion quantifies the degree of variability or diversity within the data. Two datasets can share the same average, yet their distributions may look completely different—one tightly clustered, the other broadly scattered.
The most commonly used measures of dispersion are range, variance, and standard deviation.
- Range is the simplest measure, obtained by subtracting the smallest value from the largest. Although easy to compute, it is highly sensitive to outliers and therefore not always reliable.
- Variance evaluates the average of the squared deviations from the mean. By squaring the deviations, it ensures all values are positive and provides a more comprehensive view of variability, though the unit of measurement changes (it becomes squared).
- Standard deviation is the square root of the variance, restoring the same unit as the original data. Because of this, it is often preferred in practice and widely used in statistical analysis.
Understanding dispersion (see Figure 5.1) is critical when interpreting averages. For instance, two classes may both have an average exam score of 70, but if one class has a small standard deviation, students’ scores are more consistent. A large standard deviation, on the other hand, indicates much greater variation among students.
Graphical tools provide a clear way to illustrate variability in datasets. For example:
- Boxplots show the spread through quartiles, highlight the interquartile range (IQR), and identify potential outliers.
- Histograms display the distribution of frequencies across intervals, making it easy to see whether the data are tightly clustered or widely spread.
- Scatterplots can reveal how variation appears when comparing two variables, highlighting whether data points are dispersed or concentrated around a trend.
By combining Central Tendency with these measures of dispersion, readers gain both numerical and visual insights, enabling a more accurate and holistic interpretation of their data [1], statisticsbyjim?, dispersion_pubhealth?, gfg_mean_variance_std?.