1 Introduction
1.1 Aim of this guide and further readings
This is a short introductory guide that shows the basic procedures to weight a survey. It intends to be a practical document and a step-by-step walkthrough for weighting a survey. It provides R code for all actions: from reading, manipulating and presenting data to modelling and callibration. This should allow readers to reproduce procerudes and results as well as to inspect objects at any given time. The source code in an ‘R notebook’ format can be found and publicly accessed from the author’s github page. It is important to note that this is not an ‘R tutorial’. Thus, the guide does not provide a detailed comment on general functions used in this walk-through. Some intermediate ‘R’ skills might be requiered to follow all steps of the procedure. Readers with basic or no knowledge of ‘R’ can still benefit from this note as it explains the steps and principles behind weighting.
It is relevant to say that this text intentionally avoids explaining complex or advanced methods. Instead, it aims at providing users with a basic and standard way of weighting and a limited number of variations.
Next sections will first give a very broad glimpse at all main survey weighting steps.The second section deals with importing data into R, data manipulation and briefly presenting the dataset used for this guide. Readers interested in a specific step, familiar with the 7th round of the European Social Survey or that want to jump directly into weighting procedures can skip this part of the guide. The three sections are the main components of this guide and show how to compute design weights, non-response weights and calibration weights. Two more sections will be added to this survey in the future. These correspond to the analysis of weight variability and computing weighted estimates.
For more information you can check the following introductory texts:
- Valliant et al. (2013) Practical Tools for Designing and Weighting Survey Samples. New York: Springer Science Business Media.
- Lohr,S.L. (2009) Sampling: Design and Analysis. 2nd Edition. Boston: Books/Cole.
- Blair, E. & Blair, J. (2015) Applied Survey Sampling. London: SAGE Publications Inc.
And the book accompaining the R ‘survey’ package:
- Lumley,T. (2010) Complex Surveys: A Guide to Analysis Using R. New Jersey: John Wiley & Sons Inc.
It might be also worth keeping an eye on the (still incipient) R package srvyr, developed and maintained by Greg Freedman Ellis.
Note: This guide focuses on surveys based on ‘probability sample’. These are surveys where all units in our statistical population have a chance of being selected and the probability of selection is known to the researcher. A brief note on how to weight non-probability samples is included at the end of the guide.
1.2 Basic steps in weighting a survey
Weights are applied to reduce survey bias. In plain words, weighting consists on making our sample of survey respondents (more) representative of our statistical population. By statistical population we mean all those units for which we want to compute estimates.
There are four basic steps in weighting. These are:
- Base/design weights
- Non-response weights
- Use of auxiliary data/calibration
- Analysis of weight variability/trimming
The first step consists on computing weights to take into account the differences of units in the probability of being sampled. ‘Being sampled’ means being selected from the survey frame (i.e. the list of all units) to be approached for a survey response. This step can be skipped if all units in the survey frame have the same probability of being sampled. This happens, for example: * when all units in the survey frame are approached for the sample or; * with certain sampling designs (such as ‘simple random sampling without replacement’ or ‘stratified random sampling without replacement’ with distribution of sampled units across stratums proportional to the number of units in each stratum). These are usually called ‘self-weighted’ surveys.
In the second step we need to adjust our responses by the differences in the probability of sampled units to reply to our survey. Our estimates would be biased if some profile of sampled units had higher propensity to reply than another and these profiles had differences in the dependent variables (i.e. our variables of interest). In this step, we need to estimate the probability of response using information available for both respondents and non-respondents. Non-response adjustment is not needed if all sampled units responded to the survey (i.e. in probability sampling surveys with 100% response rates).
The third step consists on adjusting our weights using available information about total population estimates. Note that this requieres data that is different from that needed in non-response adjustment (second step). Here we need auxiliary data which tells us information (i.e. estimates such as proportions, means, sums, counts) about the statistical population. The same variables should be available from our respondents but here we don’t need information about non-respondents.
The last step is to check the variablity in our computed weights. High variation in weights can lead to some observations having too much importance in our sample. Even if weights reduce bias, they might largely inflate variance of estimates. Therefore, some survey practitioners worry about dealing with highly unequal weights.