3.1 Numerical Measures

There are differences between a population and a sample

Measures of Category Population Sample
- What is it? Reality A small fraction of reality (inference)
- Characteristics described by Parameters Statistics
Central Tendency Mean \(\mu = E(Y)\) \(\hat{\mu} = \overline{y}\)
Central Tendency Median 50-th percentile \(y_{(\frac{n+1}{2})}\)
Dispersion Variance \(\sigma^2=var(Y)\)  \(=E(Y-\mu)^2\) \(s^2=\frac{1}{n-1} \sum_{i = 1}^{n} (y_i-\overline{y})^2\)  \(=\frac{1}{n-1} \sum_{i = 1}^{n} (y_i^2-n\overline{y}^2)\)
Dispersion Coefficient of Variation \(\frac{\sigma}{\mu}\) \(\frac{s}{\overline{y}}\)
Dispersion Interquartile Range difference between 25th and 75th percentiles. Robust to outliers
Shape Skewness  Standardized 3rd central moment (unitless) \(g_1=\frac{\mu_3}{\mu_2^{3/2}}\) \(\hat{g_1}=\frac{m_3}{m_2sqrt(m_2)}\)
Shape Central moments \(\mu=E(Y)\)  \(\mu_2 = \sigma^2=E(Y-\mu)^2\)  \(\mu_3 = E(Y-\mu)^3\)  \(\mu_4 = E(Y-\mu)^4\) \(m_2=\sum_{i=1}^{n}(y_1-\overline{y})^2/n\)   \(m_3=\sum_{i=1}^{n}(y_1-\overline{y})^3/n\)
Shape Kurtosis (peakedness and tail thickness)  Standardized 4th central moment \(g_2^*=\frac{E(Y-\mu)^4}{\sigma^4}\) \(\hat{g_2}=\frac{m_4}{m_2^2}-3\)

Note:

  • Order Statistics: \(y_{(1)},y_{(2)},...,y_{(n)}\) where \(y_{(1)}<y_{(2)}<...<y_{(n)}\)

  • Coefficient of variation: standard deviation over mean. This metric is stable, dimensionless statistic for comparison.

  • Symmetric: mean = median, skewness = 0

  • Skewed right: mean > median, skewness > 0

  • Skewed left: mean < median, skewness < 0

  • Central moments: \(\mu=E(Y)\) , \(\mu_2 = \sigma^2=E(Y-\mu)^2\) , \(\mu_3 = E(Y-\mu)^3\), \(\mu_4 = E(Y-\mu)^4\)

  • For normal distributions, \(\mu_3=0\), so \(g_1=0\)

  • \(\hat{g_1}\) is distributed approximately as N(0,6/n) if sample is from a normal population. (valid when n > 150)

    • For large samples, inference on skewness can be based on normal tables with 95% confidence interval for \(g_1\) as \(\hat{g_1}\pm1.96\sqrt{6/n}\)
    • For small samples, special tables from Snedecor and Cochran 1989, Table A 19(i) or Monte Carlo test
Kurtosis > 0 (leptokurtic) heavier tail compared to a normal distribution with the same \(\sigma\) (e.g., t-distribution)
Kurtosis < 0 (platykurtic) lighter tail compared to a normal distribution with the same \(\sigma\)
  • For a normal distribution, \(g_2^*=3\). Kurtosis is often redefined as: \(g_2=\frac{E(Y-\mu)^4}{\sigma^4}-3\) where the 4th central moment is estimated by \(m_4=\sum_{i=1}^{n}(y_i-\overline{y})^4/n\)

    • the asymptotic sampling distribution for \(\hat{g_2}\) is approximately N(0,24/n) (with n > 1000)
    • large sample on kurtosis uses standard normal tables
    • small sample uses tables by Snedecor and Cochran, 1989, Table A 19(ii) or Geary 1936
data = rnorm(100)
library(e1071)
skewness(data,type = 1)
## [1] 0.2128681
kurtosis(data, type = 1)
## [1] 0.5225319