3.1 Numerical Measures
There are differences between a population and a sample
Measures of | Category | Population | Sample |
---|---|---|---|
- | What is it? | Reality | A small fraction of reality (inference) |
- | Characteristics described by | Parameters | Statistics |
Central Tendency | Mean | \(\mu = E(Y)\) | \(\hat{\mu} = \overline{y}\) |
Central Tendency | Median | 50-th percentile | \(y_{(\frac{n+1}{2})}\) |
Dispersion | Variance | \[\begin{aligned} \sigma^2 &= var(Y) \\ &= E(Y- \mu^2) \end{aligned}\] | \(s^2=\frac{1}{n-1} \sum_{i = 1}^{n} (y_i-\overline{y})^2\) |
Dispersion | Coefficient of Variation | \(\frac{\sigma}{\mu}\) | \(\frac{s}{\overline{y}}\) |
Dispersion | Interquartile Range | difference between 25th and 75th percentiles. Robust to outliers | |
Shape | Skewness Standardized 3rd central moment (unitless) | \(g_1=\frac{\mu_3}{\mu_2^{3/2}}\) | \(\hat{g_1}=\frac{m_3}{m_2sqrt(m_2)}\) |
Shape | Central moments | \(\mu=E(Y)\) \(\mu_2 = \sigma^2=E(Y-\mu)^2\) \(\mu_3 = E(Y-\mu)^3\) \(\mu_4 = E(Y-\mu)^4\) | | \(m_2=\sum_{i=1}^{n}(y_1-\overline{y})^2/n\) \(m_3=\sum_{i=1}^{n}(y_1-\overline{y})^3/n\) |
Shape | Kurtosis (peakedness and tail thickness) Standardized 4th central moment | \(g_2^*=\frac{E(Y-\mu)^4}{\sigma^4}\) | \(\hat{g_2}=\frac{m_4}{m_2^2}-3\) |
Note:
Order Statistics: \(y_{(1)},y_{(2)},...,y_{(n)}\) where \(y_{(1)}<y_{(2)}<...<y_{(n)}\)
Coefficient of variation: standard deviation over mean. This metric is stable, dimensionless statistic for comparison.
Symmetric: mean = median, skewness = 0
Skewed right: mean > median, skewness > 0
Skewed left: mean < median, skewness < 0
Central moments: \(\mu=E(Y)\) , \(\mu_2 = \sigma^2=E(Y-\mu)^2\) , \(\mu_3 = E(Y-\mu)^3\), \(\mu_4 = E(Y-\mu)^4\)
For normal distributions, \(\mu_3=0\), so \(g_1=0\)
\(\hat{g_1}\) is distributed approximately as \(N(0,6/n)\) if sample is from a normal population. (valid when \(n > 150\))
- For large samples, inference on skewness can be based on normal tables with 95% confidence interval for \(g_1\) as \(\hat{g_1}\pm1.96\sqrt{6/n}\)
- For small samples, special tables from Snedecor and Cochran 1989, Table A 19(i) or Monte Carlo test
Kurtosis > 0 (leptokurtic) | heavier tail | compared to a normal distribution with the same \(\sigma\) (e.g., t-distribution) |
Kurtosis < 0 (platykurtic) | lighter tail | compared to a normal distribution with the same \(\sigma\) |
For a normal distribution, \(g_2^*=3\). Kurtosis is often redefined as: \(g_2=\frac{E(Y-\mu)^4}{\sigma^4}-3\) where the 4th central moment is estimated by \(m_4=\sum_{i=1}^{n}(y_i-\overline{y})^4/n\)
- the asymptotic sampling distribution for \(\hat{g_2}\) is approximately \(N(0,24/n)\) (with \(n > 1000\))
- large sample on kurtosis uses standard normal tables
- small sample uses tables by Snedecor and Cochran, 1989, Table A 19(ii) or Geary 1936