Chapter 6 Stratified Sampling
Stratified sampling is more common than simple random sampling because it is just as precise and costs less. Stratified sampling randomly samples from pre-defined groups (strata).
The Horvitz-Thompson estimator of the total is sum of the individual strati and its variance is the sum of the strati variances.
Example
Data set apistrat
is a staratified sample of the Academic Performance Index (API) of n = 200 of the N = 6,194 schools in California. The strati are \(n_E\) = 100 elementary schools \(n_M\) = 50 middle schools, and \(n_H\) = 50 high schools. The fpc
columnn in this data set is the population size for each stratum.
<- svydesign(id = ~1, strata = ~stype, fpc = ~fpc, data = apistrat)
strata_design svytotal(~stype, strata_design)
## total SE
## stypeE 4421 0
## stypeH 755 0
## stypeM 1018 0
Since high schools tend to be larger, fixing the number of of sampled high schools should reduce the variance in mean and total enrollment relative to simple random sampling (compare to apisrd
in Chapter 5).
svytotal(~enroll, strata_design)
## total SE
## enroll 3687178 114642
svymean(~enroll, strata_design)
## mean SE
## enroll 595.28 18.509