3.1 Numerical Measures
There are differences between a population and a sample:
Measures of | Category | Population | Sample |
---|---|---|---|
What is it? | Reality | A small fraction of reality (inference) | |
Characteristics described by | Parameters | Statistics | |
Central Tendency | Mean | μ=E(Y) | ˆμ=¯y |
Central Tendency | Median | 50th percentile | y(n+12) |
Dispersion | Variance | σ2=var(Y)=E[(Y−μ)2] | s2=1n−1∑ni=1(yi−¯y)2 |
Dispersion | Coefficient of Variation | σμ | s¯y |
Dispersion | Interquartile Range | Difference between 25th and 75th percentiles; robust to outliers | |
Shape | Skewness Standardized 3rd central moment (unitless) |
g1=μ3σ3 | ^g1=m3m3/22 |
Shape | Central moments | μ=E(Y), μ2=σ2=E[(Y−μ)2], μ3=E[(Y−μ)3], μ4=E[(Y−μ)4] | m2=1n∑ni=1(yi−¯y)2 m3=1n∑ni=1(yi−¯y)3 |
Shape | Kurtosis (peakedness and tail thickness) Standardized 4th central moment |
g∗2=E[(Y−μ)4]σ4 | ^g2=m4m22−3 |
Notes:
Order Statistics: y(1),y(2),…,y(n), where y(1)<y(2)<…<y(n).
Coefficient of Variation:
- Defined as the standard deviation divided by the mean.
- A stable, unitless statistic useful for comparison.
Symmetry:
- Symmetric distributions: Mean = Median; Skewness = 0.
- Skewed Right: Mean > Median; Skewness > 0.
- Skewed Left: Mean < Median; Skewness < 0.
Central Moments:
- μ=E(Y)
- μ2=σ2=E[(Y−μ)2]
- μ3=E[(Y−μ)3]
- μ4=E[(Y−μ)4]
Skewness (^g1)
- Sampling Distribution:
For samples drawn from a normal population:- ^g1 is approximately distributed as N(0,6n) when n>150.
- Inference:
- Large Samples: Inference on skewness can be based on the standard normal distribution.
The 95% confidence interval for g1 is given by: ^g1±1.96√6n - Small Samples: For small samples, consult special tables such as:
- Snedecor and Cochran (1989), Table A 19(i)
- Monte Carlo test results
- Large Samples: Inference on skewness can be based on the standard normal distribution.
Kurtosis (^g2)
- Definitions and Relationships:
- A normal distribution has kurtosis g∗2=3.
Kurtosis is often redefined as: g2=E[(Y−μ)4]σ4−3 where the 4th central moment is estimated by: m4=∑ni=1(yi−¯y)4n
- A normal distribution has kurtosis g∗2=3.
- Sampling Distribution:
For large samples (n>1000):- ^g2 is approximately distributed as N(0,24n).
- Inference:
Kurtosis Value | Tail Behavior | Comparison to Normal Distribution |
---|---|---|
g2>0 (Leptokurtic) | Heavier Tails | Examples: t-distributions |
g2<0 (Platykurtic) | Lighter Tails | Examples: Uniform or certain bounded distributions |
g2=0 (Mesokurtic) | Normal Tails | Exactly matches the normal distribution |
# Generate random data from a normal distribution
data <- rnorm(100)
# Load the e1071 package for skewness and kurtosis functions
library(e1071)
# Calculate skewness
skewness_value <- skewness(data)
cat("Skewness:", skewness_value, "\n")
#> Skewness: 0.362615
# Calculate kurtosis
kurtosis_value <- kurtosis(data)
cat("Kurtosis:", kurtosis_value, "\n")
#> Kurtosis: -0.3066409
References
Geary, R Cf. 1936. “Moments of the Ratio of the Mean Deviation to the Standard Deviation for Normal Samples.” Biometrika, 295–307.
Snedecor, George W., and William G. Cochran. 1989. “Statistical Methods.”