3.1 Numerical Measures

Differences between a population and a sample
Measures of	Category	Population	Sample
	What is it?	Reality	A small fraction of reality (inference)
	Characteristics described by	Parameters	Statistics
Central Tendency	Mean	\(\mu = E(Y)\)	\(\hat{\mu} = \overline{y}\)
Central Tendency	Median	50th percentile	\(y_{(\frac{n+1}{2})}\)
Dispersion	Variance	\[\sigma^2 = var(Y) = E[(Y-\mu)^2]\]	\(s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \overline{y})^2\)
Dispersion	Coefficient of Variation	\(\frac{\sigma}{\mu}\)	\(\frac{s}{\overline{y}}\)
Dispersion	Interquartile Range	Difference between 25th and 75th percentiles; robust to outliers
Shape	Skewness Standardized 3rd central moment (unitless)	\(g_1 = \frac{\mu_3}{\sigma^3}\)	\(\hat{g_1} = \frac{m_3}{m_2^{3/2}}\)
Shape	Central moments	\(\mu=E(Y)\), \(\mu_2 = \sigma^2 = E[(Y-\mu)^2]\), \(\mu_3 = E[(Y-\mu)^3]\), \(\mu_4 = E[(Y-\mu)^4]\)	\(m_2 = \frac{1}{n} \sum_{i=1}^{n} (y_i - \overline{y})^2\) \(m_3 = \frac{1}{n} \sum_{i=1}^{n} (y_i - \overline{y})^3\)
Shape	Kurtosis (peakedness and tail thickness) Standardized 4th central moment	\(g_2^* = \frac{E[(Y-\mu)^4]}{\sigma^4}\)	\(\hat{g_2} = \frac{m_4}{m_2^2} - 3\)

Notes:

Order Statistics: \(y_{(1)}, y_{(2)}, \ldots, y_{(n)}\), where \(y_{(1)} < y_{(2)} < \ldots < y_{(n)}\).
Coefficient of Variation:
- Defined as the standard deviation divided by the mean.
- A stable, unitless statistic useful for comparison.
Symmetry:
- Symmetric distributions: Mean = Median; Skewness = 0.
- Skewed Right: Mean > Median; Skewness > 0.
- Skewed Left: Mean < Median; Skewness < 0.
Central Moments:
- \(\mu = E(Y)\)
- \(\mu_2 = \sigma^2 = E[(Y-\mu)^2]\)
- \(\mu_3 = E[(Y-\mu)^3]\)
- \(\mu_4 = E[(Y-\mu)^4]\)

Skewness (\(\hat{g_1}\))

Sampling Distribution:
For samples drawn from a normal population:
- \(\hat{g_1}\) is approximately distributed as \(N(0, \frac{6}{n})\) when \(n > 150\).
Inference:
- Large Samples: Inference on skewness can be based on the standard normal distribution.
  The 95% confidence interval for \(g_1\) is given by: \[ \hat{g_1} \pm 1.96 \sqrt{\frac{6}{n}} \]
- Small Samples: For small samples, consult special tables such as:
  - Snedecor and Cochran (1989), Table A 19(i)
  - Monte Carlo test results

Kurtosis (\(\hat{g_2}\))

Definitions and Relationships:
- A normal distribution has kurtosis \(g_2^* = 3\).
  Kurtosis is often redefined as: \[ g_2 = \frac{E[(Y - \mu)^4]}{\sigma^4} - 3 \] where the 4th central moment is estimated by: \[ m_4 = \frac{\sum_{i=1}^n (y_i - \overline{y})^4}{n} \]
Sampling Distribution:
For large samples (\(n > 1000\)):
- \(\hat{g_2}\) is approximately distributed as \(N(0, \frac{24}{n})\).
Inference:
- Large Samples: Inference for kurtosis can use standard normal tables.
- Small Samples: Refer to specialized tables such as:
  - Snedecor and Cochran (1989), Table A 19(ii)
  - Geary (1936)

Kurtosis types and comparison to the normal distribution
Kurtosis Value	Tail Behavior	Comparison to Normal Distribution
\(g_2 > 0\) (Leptokurtic)	Heavier Tails	Examples: \(t\)-distributions
\(g_2 < 0\) (Platykurtic)	Lighter Tails	Examples: Uniform or certain bounded distributions
\(g_2 = 0\) (Mesokurtic)	Normal Tails	Exactly matches the normal distribution

# Generate random data from a normal distribution
data <- rnorm(100)

# Load the e1071 package for skewness and kurtosis functions
library(e1071)

# Calculate skewness
skewness_value <- skewness(data)
cat("Skewness:", skewness_value, "\n")
#> Skewness: -0.2663921

# Calculate kurtosis
kurtosis_value <- kurtosis(data)
cat("Kurtosis:", kurtosis_value, "\n")
#> Kurtosis: -0.3584637

References

Geary, R Cf. 1936. “Moments of the Ratio of the Mean Deviation to the Standard Deviation for Normal Samples.” Biometrika, 295–307.

Snedecor, George W., and William G. Cochran. 1989. “Statistical Methods.”