3.1 Numerical Measures

There are differences between a population and a sample:

Measures of Category Population Sample
What is it? Reality A small fraction of reality (inference)
Characteristics described by Parameters Statistics
Central Tendency Mean μ=E(Y) ˆμ=¯y
Central Tendency Median 50th percentile y(n+12)
Dispersion Variance σ2=var(Y)=E[(Yμ)2] s2=1n1ni=1(yi¯y)2
Dispersion Coefficient of Variation σμ s¯y
Dispersion Interquartile Range Difference between 25th and 75th percentiles; robust to outliers
Shape

Skewness

Standardized 3rd central moment (unitless)

g1=μ3σ3 ^g1=m3m3/22
Shape Central moments μ=E(Y), μ2=σ2=E[(Yμ)2], μ3=E[(Yμ)3], μ4=E[(Yμ)4]

m2=1nni=1(yi¯y)2

m3=1nni=1(yi¯y)3

Shape

Kurtosis

(peakedness and tail thickness) Standardized 4th central moment

g2=E[(Yμ)4]σ4 ^g2=m4m223

Notes:

  1. Order Statistics: y(1),y(2),,y(n), where y(1)<y(2)<<y(n).

  2. Coefficient of Variation:

    • Defined as the standard deviation divided by the mean.
    • A stable, unitless statistic useful for comparison.
  3. Symmetry:

    • Symmetric distributions: Mean = Median; Skewness = 0.
    • Skewed Right: Mean > Median; Skewness > 0.
    • Skewed Left: Mean < Median; Skewness < 0.
  4. Central Moments:

    • μ=E(Y)
    • μ2=σ2=E[(Yμ)2]
    • μ3=E[(Yμ)3]
    • μ4=E[(Yμ)4]

Skewness (^g1)

  1. Sampling Distribution:
    For samples drawn from a normal population:
    • ^g1 is approximately distributed as N(0,6n) when n>150.
  2. Inference:
    • Large Samples: Inference on skewness can be based on the standard normal distribution.
      The 95% confidence interval for g1 is given by: ^g1±1.966n
    • Small Samples: For small samples, consult special tables such as:
      • Snedecor and Cochran (1989), Table A 19(i)
      • Monte Carlo test results

Kurtosis (^g2)

  1. Definitions and Relationships:
    • A normal distribution has kurtosis g2=3.
      Kurtosis is often redefined as: g2=E[(Yμ)4]σ43 where the 4th central moment is estimated by: m4=ni=1(yi¯y)4n
  2. Sampling Distribution:
    For large samples (n>1000):
    • ^g2 is approximately distributed as N(0,24n).
  3. Inference:
    • Large Samples: Inference for kurtosis can use standard normal tables.
    • Small Samples: Refer to specialized tables such as:
      • Snedecor and Cochran (1989), Table A 19(ii)
      • Geary (1936)
Kurtosis Value Tail Behavior Comparison to Normal Distribution
g2>0 (Leptokurtic) Heavier Tails Examples: t-distributions
g2<0 (Platykurtic) Lighter Tails Examples: Uniform or certain bounded distributions
g2=0 (Mesokurtic) Normal Tails Exactly matches the normal distribution
# Generate random data from a normal distribution
data <- rnorm(100)

# Load the e1071 package for skewness and kurtosis functions
library(e1071)

# Calculate skewness
skewness_value <- skewness(data)
cat("Skewness:", skewness_value, "\n")
#> Skewness: 0.362615

# Calculate kurtosis
kurtosis_value <- kurtosis(data)
cat("Kurtosis:", kurtosis_value, "\n")
#> Kurtosis: -0.3066409

References

Geary, R Cf. 1936. “Moments of the Ratio of the Mean Deviation to the Standard Deviation for Normal Samples.” Biometrika, 295–307.
Snedecor, George W., and William G. Cochran. 1989. “Statistical Methods.”