Chapter 10 The survey commands in STATA

To conduct survey data analysis with complex design features, such as unequal weighting, stratification, clustering, and weighting adjustment to external data sources, STATA provides specific survey commands. Their implementation is fairly straightforward. First, the design features must be declared. Then, the survey commands can be invoked using the keyword svy:

10.1 Declare the design features

To declare a sampling design in the case of a one-stage design, the command svyset must be used:

\[ \mathbf{svyset}~su~[\mathbf{pweight}=weight],~\mathbf{strata}(strata)~\mathbf{fpc}(fpc) \]

where

  • su is the identification code for the analysis units
  • pweight is the sampling weight
  • stata is the stratum variable
  • fpc is the stratum population size (or the sampling fraction if fpc<1)

In the case of a multistage design, each sampling stage must be declared:

\[\begin{equation*} \mathbf{svyset}~su_1~[\mathbf{pweight}=weight],~\mathbf{strata}(strata_1)~\mathbf{fpc}(fpc_1) \\\ || ~su_2,~\mathbf{strata}(strata_2)~\mathbf{fpc}(fpc_2) ... \\\ || ~su_n,~\mathbf{strata}(strata_n)~\mathbf{fpc}(fpc_n) \end{equation*}\]bf{strata}(strata_2)~(fpc_2) … \
|| su_n,(strata_n)~(fpc_n) \end{equation*}

10.2 Compute descriptive statistics

Once the sample design is specified, descriptive statistics are computed by prefixing the command with the keyword svy::

  • svy : mean meanvar
  • svy : total totalvar
  • svy : ratio [rationame :] numerator / denominator
  • svy : proportion propvar
  • svy : tabulate var
  • svy : tabulate var1 var2

In addition, unconditional analysis for subpopulations can be obtained using the over or subpop commands:

  • svy, subpop (condition) : mean meanvar
  • svy : mean meanvar, over (listvars)

These commands compute survey-based statistics taking into account the design features specified by the svyset command. In addition confidence intervals are generated using a t distribution with \(n-H+1\) degrees of freedom, where \(H\) is the total number of stata (\(H=1\) if there is no stratification). Thanks to the level subcommand, the level of confidence (by default 95%) can be set.

Example of a survey command

Figure 10.1: Example of a survey command

Post-estimation commands are also available to generate additional statistics related to the estimation of standard errors and confidence intervals:

  • svydescribe ==> Examine the design structure of the dataset. It can also be used to see the number of missing and nonmissing observations per stratum (or optionally per stage) for one or more variables.

  • estat size ==> Calculate the number of observations in each subpopulation and estimate the subpopulation size

  • estat cv ==> Calculate the coefficient of variation of the estimator

  • estat effects ==> Estimate the Design Effect factors (DEFF and DEFT)