3.1 Numerical Measures

There are differences between a population and a sample

Measures of Category Population Sample
- What is it? Reality A small fraction of reality (inference)
- Characteristics described by Parameters Statistics
Central Tendency Mean $$\mu = E(Y)$$ $$\hat{\mu} = \overline{y}$$
Central Tendency Median 50-th percentile $$y_{(\frac{n+1}{2})}$$
Dispersion Variance \begin{aligned} \sigma^2 &= var(Y) \\ &= E(Y- \mu^2) \end{aligned} $$s^2=\frac{1}{n-1} \sum_{i = 1}^{n} (y_i-\overline{y})^2$$
Dispersion Coefficient of Variation $$\frac{\sigma}{\mu}$$ $$\frac{s}{\overline{y}}$$
Dispersion Interquartile Range difference between 25th and 75th percentiles. Robust to outliers
Shape Skewness Standardized 3rd central moment (unitless) $$g_1=\frac{\mu_3}{\mu_2^{3/2}}$$ $$\hat{g_1}=\frac{m_3}{m_2sqrt(m_2)}$$
Shape Central moments $$\mu=E(Y)$$ $$\mu_2 = \sigma^2=E(Y-\mu)^2$$ $$\mu_3 = E(Y-\mu)^3$$ $$\mu_4 = E(Y-\mu)^4$$ |

$$m_2=\sum_{i=1}^{n}(y_1-\overline{y})^2/n$$

$$m_3=\sum_{i=1}^{n}(y_1-\overline{y})^3/n$$

Shape Kurtosis (peakedness and tail thickness) Standardized 4th central moment $$g_2^*=\frac{E(Y-\mu)^4}{\sigma^4}$$ $$\hat{g_2}=\frac{m_4}{m_2^2}-3$$

Note:

• Order Statistics: $$y_{(1)},y_{(2)},...,y_{(n)}$$ where $$y_{(1)}<y_{(2)}<...<y_{(n)}$$

• Coefficient of variation: standard deviation over mean. This metric is stable, dimensionless statistic for comparison.

• Symmetric: mean = median, skewness = 0

• Skewed right: mean > median, skewness > 0

• Skewed left: mean < median, skewness < 0

• Central moments: $$\mu=E(Y)$$ , $$\mu_2 = \sigma^2=E(Y-\mu)^2$$ , $$\mu_3 = E(Y-\mu)^3$$, $$\mu_4 = E(Y-\mu)^4$$

• For normal distributions, $$\mu_3=0$$, so $$g_1=0$$

• $$\hat{g_1}$$ is distributed approximately as $$N(0,6/n)$$ if sample is from a normal population. (valid when $$n > 150$$)

• For large samples, inference on skewness can be based on normal tables with 95% confidence interval for $$g_1$$ as $$\hat{g_1}\pm1.96\sqrt{6/n}$$
• For small samples, special tables from Snedecor and Cochran 1989, Table A 19(i) or Monte Carlo test
 Kurtosis > 0 (leptokurtic) heavier tail compared to a normal distribution with the same $$\sigma$$ (e.g., t-distribution) Kurtosis < 0 (platykurtic) lighter tail compared to a normal distribution with the same $$\sigma$$
• For a normal distribution, $$g_2^*=3$$. Kurtosis is often redefined as: $$g_2=\frac{E(Y-\mu)^4}{\sigma^4}-3$$ where the 4th central moment is estimated by $$m_4=\sum_{i=1}^{n}(y_i-\overline{y})^4/n$$

• the asymptotic sampling distribution for $$\hat{g_2}$$ is approximately $$N(0,24/n)$$ (with $$n > 1000$$)
• large sample on kurtosis uses standard normal tables
• small sample uses tables by Snedecor and Cochran, 1989, Table A 19(ii) or Geary 1936
data = rnorm(100)
library(e1071)
skewness(data)
#> [1] -0.2046225
kurtosis(data)
#> [1] -0.6313715