Chapter 10 The survey commands in STATA
STATA provides specific survey commands to perform survey data analysis taking into account complex design features such as unequal weighting, stratification, clustering, reweighting for unit non-response, and calibration adjustment to external data sources. Their implementation is fairly straightforward. First, the design features need to be declared. Then the survey commands can be invoked using the keyword svy:
10.1 Declare the design features
To declare a sampling design in the case of a one-stage design, the command svyset must be used:
\[ \mathbf{svyset}~su~[\mathbf{pweight}=weight],~\mathbf{strata}(strata)~\mathbf{fpc}(fpc) \]
where
- su is the identification code for the analysis units
- pweight is the sampling weight
- stata is the stratum variable
- fpc is the stratum population size (or the sampling fraction if fpc<1)
In the case of a multistage design, each sampling stage must be declared:
\[\begin{equation*} \mathbf{svyset}~su_1~[\mathbf{pweight}=weight],~\mathbf{strata}(strata_1)~\mathbf{fpc}(fpc_1) \\ || ~su_2,~\mathbf{strata}(strata_2)~\mathbf{fpc}(fpc_2) ... \\ || ~su_n,~\mathbf{strata}(strata_n)~\mathbf{fpc}(fpc_n) \end{equation*}\]
10.2 Compute descriptive statistics
Once the sample design is specified, descriptive statistics are computed by prefixing the command with the keyword svy::
- svy : mean meanvar
- svy : total totalvar
- svy : ratio [rationame :] numerator / denominator
- svy : proportion propvar
- svy : tabulate var
- svy : tabulate var1 var2
In addition, unconditional analysis for subpopulations can be obtained using the over or subpop commands:
- svy, subpop (condition) : mean meanvar
- svy : mean meanvar, over (listvars)
These commands compute survey-based statistics, taking into account the design features specified by the svyset command. In addition, confidence intervals are generated using a t distribution with \(n-H\) degrees of freedom, where \(H\) is the total number of stata (\(H=1\) if there is no stratification). Thanks to the level sub-command, the level of confidence (by default 95%) can be set.
Post-estimation commands are also available to generate additional statistics related to the estimation of standard errors and confidence intervals:
svydescribe ==> Examine the design structure of the dataset. It can also be used to see the number of missing and nonmissing observations per stratum (or optionally per stage) for one or more variables.
estat size ==> Calculate the number of observations in each subpopulation and estimate the subpopulation size
estat cv ==> Calculate the coefficient of variation of the estimator
estat effects ==> Estimate the design effect factors (DEFF and DEFT), i.e. the ratio between the estimated variance and the hypothetical variance we would get under simple random sampling (DEFF). The DEFT factor is the square root of the DEFF and compares the standard errors rather than the variances under the two plans.