3.1 Numerical Measures

There are differences between a population and a sample:

Measures of Category Population Sample
What is it? Reality A small fraction of reality (inference)
Characteristics described by Parameters Statistics
Central Tendency Mean \(\mu = E(Y)\) \(\hat{\mu} = \overline{y}\)
Central Tendency Median 50th percentile \(y_{(\frac{n+1}{2})}\)
Dispersion Variance \[\sigma^2 = var(Y) = E[(Y-\mu)^2]\] \(s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \overline{y})^2\)
Dispersion Coefficient of Variation \(\frac{\sigma}{\mu}\) \(\frac{s}{\overline{y}}\)
Dispersion Interquartile Range Difference between 25th and 75th percentiles; robust to outliers
Shape

Skewness

Standardized 3rd central moment (unitless)

\(g_1 = \frac{\mu_3}{\sigma^3}\) \(\hat{g_1} = \frac{m_3}{m_2^{3/2}}\)
Shape Central moments \(\mu=E(Y)\), \(\mu_2 = \sigma^2 = E[(Y-\mu)^2]\), \(\mu_3 = E[(Y-\mu)^3]\), \(\mu_4 = E[(Y-\mu)^4]\)

\(m_2 = \frac{1}{n} \sum_{i=1}^{n} (y_i - \overline{y})^2\)

\(m_3 = \frac{1}{n} \sum_{i=1}^{n} (y_i - \overline{y})^3\)

Shape

Kurtosis

(peakedness and tail thickness) Standardized 4th central moment

\(g_2^* = \frac{E[(Y-\mu)^4]}{\sigma^4}\) \(\hat{g_2} = \frac{m_4}{m_2^2} - 3\)

Notes:

  1. Order Statistics: \(y_{(1)}, y_{(2)}, \ldots, y_{(n)}\), where \(y_{(1)} < y_{(2)} < \ldots < y_{(n)}\).

  2. Coefficient of Variation:

    • Defined as the standard deviation divided by the mean.
    • A stable, unitless statistic useful for comparison.
  3. Symmetry:

    • Symmetric distributions: Mean = Median; Skewness = 0.
    • Skewed Right: Mean > Median; Skewness > 0.
    • Skewed Left: Mean < Median; Skewness < 0.
  4. Central Moments:

    • \(\mu = E(Y)\)
    • \(\mu_2 = \sigma^2 = E[(Y-\mu)^2]\)
    • \(\mu_3 = E[(Y-\mu)^3]\)
    • \(\mu_4 = E[(Y-\mu)^4]\)

Skewness (\(\hat{g_1}\))

  1. Sampling Distribution:
    For samples drawn from a normal population:
    • \(\hat{g_1}\) is approximately distributed as \(N(0, \frac{6}{n})\) when \(n > 150\).
  2. Inference:
    • Large Samples: Inference on skewness can be based on the standard normal distribution.
      The 95% confidence interval for \(g_1\) is given by: \[ \hat{g_1} \pm 1.96 \sqrt{\frac{6}{n}} \]
    • Small Samples: For small samples, consult special tables such as:
      • Snedecor and Cochran (1989), Table A 19(i)
      • Monte Carlo test results

Kurtosis (\(\hat{g_2}\))

  1. Definitions and Relationships:
    • A normal distribution has kurtosis \(g_2^* = 3\).
      Kurtosis is often redefined as: \[ g_2 = \frac{E[(Y - \mu)^4]}{\sigma^4} - 3 \] where the 4th central moment is estimated by: \[ m_4 = \frac{\sum_{i=1}^n (y_i - \overline{y})^4}{n} \]
  2. Sampling Distribution:
    For large samples (\(n > 1000\)):
    • \(\hat{g_2}\) is approximately distributed as \(N(0, \frac{24}{n})\).
  3. Inference:
    • Large Samples: Inference for kurtosis can use standard normal tables.
    • Small Samples: Refer to specialized tables such as:
      • Snedecor and Cochran (1989), Table A 19(ii)
      • Geary (1936)
Kurtosis Value Tail Behavior Comparison to Normal Distribution
\(g_2 > 0\) (Leptokurtic) Heavier Tails Examples: \(t\)-distributions
\(g_2 < 0\) (Platykurtic) Lighter Tails Examples: Uniform or certain bounded distributions
\(g_2 = 0\) (Mesokurtic) Normal Tails Exactly matches the normal distribution
# Generate random data from a normal distribution
data <- rnorm(100)

# Load the e1071 package for skewness and kurtosis functions
library(e1071)

# Calculate skewness
skewness_value <- skewness(data)
cat("Skewness:", skewness_value, "\n")
#> Skewness: 0.362615

# Calculate kurtosis
kurtosis_value <- kurtosis(data)
cat("Kurtosis:", kurtosis_value, "\n")
#> Kurtosis: -0.3066409

References

Geary, R Cf. 1936. “Moments of the Ratio of the Mean Deviation to the Standard Deviation for Normal Samples.” Biometrika, 295–307.
Snedecor, George W., and William G. Cochran. 1989. “Statistical Methods.”