5 Step 3: Use of auxiliary data for weight calibration

In certain cases we might have information about our statistical population (e.g. census counts or proportions from official statistics). We can then use these to ‘correct’ our weights. This adjustment is called calibration and consists on finding a new set of weights that are as near as possible the input (‘final’) weights but reproduce the population information exactly. Valliant et al (2013) explain that using the previous ‘final’ weights as input for the calibration step allows us to ‘borrow’ good estiamtion properties from those.

An important difference between this step and the one on non-response adjustment is that non-response adjustment requieres having information for both sampled respondents and non-respondents. Calibration only requieres information for respondents and population in general.

Here we will calibrate weights using a ‘raking’ procedure (explained in Valliant el al 2013 page 358). Unlike other calibration methods, ‘raking’ does not requiere information on cross-classification categories but just marginal population counts. In other words, we do not need the information from crossing several variables (although we can use it if available). As explained by Lumley (2010, page 139), the process is very much iterative. It involves post-stratifying on each set of variables in turn, and repeating the process until the weights stop changing.

The 7th ESS used cross-classifications of age group and gender plus region (separately) to calibrate UK data. Here we will try to reproduce their calibration. For more information on ESS post-stratification weights see their document: Documentation of ESS Post-Stratification Weights

5.1 Preparing data for calibration

First we will compute the interaction between gender and age with the categories used for calibration in the ESS.

data %<>%
  mutate(agea_rec = cut(agea %>% as.numeric(), breaks = c(14, 34, 54, 99))) %>%
         unite(col = gender_age_rec, gndr, agea_rec, remove = F) %>%
  mutate(gender_age_rec = replace(x = gender_age_rec, 
                                  list = gender_age_rec %in% c("Female_NA", "Male_NA"),
                                  values = NA ))

data %<>%
  filter(!is.na(gender_age_rec), !is.na(region)) # see footnote in text below.

The total number of weighted units in each of the categories we will use for calibration can be seen in tables below6.

data %>%
  group_by(gender_age_rec) %>%
  summarise(total_n_sample = sum(final.weight))
## # A tibble: 6 x 2
##   gender_age_rec total_n_sample
##            <chr>          <dbl>
## 1 Female_(14,34]       278.3145
## 2 Female_(34,54]       441.9940
## 3 Female_(54,99]       465.4844
## 4   Male_(14,34]       256.5388
## 5   Male_(34,54]       357.8829
## 6   Male_(54,99]       426.7719
data %>%
  group_by(region) %>%
  summarise(total_n_sample = sum(final.weight))
## # A tibble: 12 x 2
##                      region total_n_sample
##                      <fctr>          <dbl>
##  1     North East (England)       108.7733
##  2     North West (England)       255.8543
##  3 Yorkshire and the Humber       165.7948
##  4  East Midlands (England)       175.9855
##  5  West Midlands (England)       177.8547
##  6          East of England       182.0182
##  7                   London       239.2660
##  8     South East (England)       320.6673
##  9     South West (England)       187.9519
## 10                    Wales       127.3771
## 11                 Scotland       222.2071
## 12         Northern Ireland        63.2364

Now we import Eurostat data7 and recode it into ESS adjustment/post-stratification categories.

The data used in this guide are hosted in the author’s github page and the raw version can be checked in the following links:

age.gender.eurostat <- read.csv(text=getURL("https://raw.githubusercontent.com/JosepER/PPMI_how_to_weight_a_survey/master/data/Eurostat/Agebygender_demo_pjangroup.csv"), header=T)

age.gender.eurostat %<>%
  spread(key = Age, value = Value)

age.gender.eurostat %<>%
  mutate(`15to34` = `From 15 to 19 years` + `From 20 to 24 years` +  `From 25 to 29 years` +
           `From 30 to 34 years`,
         `35to54` = `From 35 to 39 years` + `From 40 to 44 years` + `From 45 to 49 years` + 
           `From 50 to 54 years`,
         `55to99` =  `From 55 to 59 years` + `From 60 to 64 years` + `From 65 to 69 years` +
           `From 70 to 74 years` + `75 years or over`) %>%
  select(SEX, `15to34`:`55to99`)
region.eurostat <- read.csv(text=getURL("https://raw.githubusercontent.com/JosepER/PPMI_how_to_weight_a_survey/master/data/Eurostat/Nuts2byage.csv"), header=T, stringsAsFactors = F)
  
region.eurostat %<>%
  gather(key = age, value = population, -Country) 

region.eurostat %<>%
  group_by(Country) %>%
  summarise(pop_sum = sum(population) )

We will now scale the Eurostat data to our sample size. The idea is to obtain the weights that make our sample proportions look like those in Eurostat. For these, we will calculate how many respondents in our sample should pertain to each category if we had Eurostat proportions. We will use our sample size based only on (weighted) completed responses in our post-stratification adjustment variables (age, gender and region).

First we compute the total (weighted) observations in our sample of respondents.

data.calibration <- data

weighted.pop <- sum(data.calibration$final.weight)

weighted.pop
## [1] 2226.987

In the next chunk of code we will scale Eurostat population by age an gender data to the size of our sample. We will do this by dividing Eurostat absolute population numbers by the sum of population in all categories and multiplying it by the previously computed sum of weighted respondents. This will provide us a dataframe (‘age.gender.eurostat’) with the number of respondents our survey sample should have in each gender and age crossed category if it was to resemble the proportions found in reality (i.e. official statistics).

age.gender.eurostat %<>%
  gather(key = age, value = population, -SEX) %>%
  unite(col = gender_age_rec, SEX, age) 

total.population <- age.gender.eurostat$population %>% 
  sum()

age.gender.eurostat %<>%
  mutate(Freq = round(population/total.population * weighted.pop, 0) ) %>%
  select(-population)

age.gender.eurostat$gender_age_rec <- c("Female_(14,34]", "Male_(14,34]",
                                        "Female_(34,54]", "Male_(34,54]",
                                        "Female_(54,99]", "Male_(54,99]")

age.gender.eurostat
##   gender_age_rec Freq
## 1 Female_(14,34]  352
## 2   Male_(14,34]  358
## 3 Female_(34,54]  372
## 4   Male_(34,54]  364
## 5 Female_(54,99]  417
## 6   Male_(54,99]  364
rm(total.population)

Next we will do the same for Eurostat’s population data on NUTS 2 regions.

total.population <- region.eurostat$pop_sum %>%
  sum()

region.eurostat %<>%
  mutate(Freq = round(pop_sum/total.population * weighted.pop, 0) ) %>%
  select(-pop_sum)

names(region.eurostat)[[1]] <- "region" 

region.eurostat$region[region.eurostat$region == "East Midlands (UK)"] <- "East Midlands (England)"
region.eurostat$region[region.eurostat$region == "North East (UK)"] <- "North East (England)"
region.eurostat$region[region.eurostat$region == "North West (UK)"] <- "North West (England)"
region.eurostat$region[region.eurostat$region == "Northern Ireland (UK)"] <- "Northern Ireland"
region.eurostat$region[region.eurostat$region == "South East (UK)"] <- "South East (England)"
region.eurostat$region[region.eurostat$region == "South West (UK)"] <- "South West (England)"
region.eurostat$region[region.eurostat$region == "West Midlands (UK)"] <- "West Midlands (England)"
region.eurostat$region[region.eurostat$region == "Yorkshire and The Humber"] <- "Yorkshire and the Humber"

Here we will briefly test that we have the same categories in calibration variables of both survey and Eurostat datasets.

if( identical(region.eurostat$region %>% unique %>% sort, data.calibration$region %>% as.character() %>% unique %>% sort) != T) {
    stop("Levels in region variable have to be the the same in the calibration and dataset used for population frequencies")
}

if( identical(age.gender.eurostat$gender_age_rec %>% unique %>% sort, data.calibration$gender_age_rec %>% as.character() %>% unique %>% sort) != T) {
    stop("Levels in age by gender categories variable have to be the the same in the calibration and dataset used for population frequencies")
}

5.2 Implementing calibration

Now we will use the R ‘survey’ package (Lumley,T., 2010) to calibrate weights using the raking procedure. We will do this twice. First time we will compute the raked weighs using our ‘final.weight’ as an input. These contain information from both the base weights and our adjustment for non-response. In the second computation we will repeat the ESS design and use (only) the design/base weights as an input (variable ‘base.weight’). This will allow us to compare our output weights with those computed by the experts behind the weighting procedure of the 7th ESS.

The R ‘survey’ package works in a rather particular way8. It first requieres to specify the design of the survey with the ‘svydesign’ function. This creates an object of an adhoc class ‘survey.design’ that is passed to further procedures such as weights raking.

The ‘svydesign’ function requieres an ‘ids’ argument with cluster ids. For the specific case of the 7th ESS in the UK, this would be the postal codes, which are the Primary Sampling Units (PSU). The procedures of the ‘survey’ package would then take into account these related responses when computing variances. However, here we will ignore this fact and pretend that all responses where independent of each other (we can do this by passing ~ 0 to the ‘ids’ argument). This should not affect the raking procedure. Next, we neet to specify the input weights in the ‘weights’ argument. Here we will pass the computed final weights and the design/base weights respectively to create both survey designs. Last, we need to specify the data with the survey respondses in the ‘data’ argument.

In this next chunk of code we will create these two objects which will correspond to the two explained computations of raked weights.

our.svydesign <- svydesign(ids = ~ 0, weights = ~final.weight, data = data.calibration)

ess.svydesign <- svydesign(ids = ~ 0, weights = ~base.weight, data = data.calibration)

The raking procedure can be done with the ‘rake’ function form the ‘survey’ package. This function requieres to pass the previously computed survey design object as its first object (‘design’). The second argument (‘sample.margins’) is a list of formulas describing which variables are going to be used for calibration. Here we will pass a list with two formulas. The first one using the ‘region’ variable and the second one the ‘gender_age_rec’ variable which corresponds to the interaction/crosstabulation of gender and age categories. The third argument (‘population.margins’) are the population counts for our calibration variables. Here we will pass our dataframes with the number of people that should be in each region/gender and age category if our sample followed population proportions.

our.raked <- rake(our.svydesign, sample.margins = list(~region, ~gender_age_rec), 
     population = list(region.eurostat, age.gender.eurostat))

ess.raked <- rake(ess.svydesign, sample.margins = list(~region, ~gender_age_rec), 
     population = list(region.eurostat, age.gender.eurostat))

Then we collect the weights from the ‘our.raked’ and ‘ess.raked’ objects we just computed and we add them to our dataset for this section (the ‘data.calibration’ dataframe).

raked.weight <- our.raked$postStrata[[1]][[1]] %>% attributes() %>% .[["weights"]]

ess.raked.weight <- ess.raked$postStrata[[1]][[1]] %>% attributes() %>% .[["weights"]]

data.calibration$ess.raked.weight <- ess.raked.weight

data.calibration$raked.weight <- raked.weight

rm(raked.weight, ess.raked.weight)

5.3 Testing results from raked weights

Next we compare the frequencies of weighted observations in our sample with those we would obtain if our dataset had the same shares of age, gender and region categories as official data (i.e. the data inputed to our calibration procedure). If our calibration is successful, the frequencies should be almost the same. Here we will do this test using our calibration procedure which included design and non-response weights as an input (i.e. calibration object ‘raked.weight’). The following chunk of code shows the results for age and gender categories.

left_join(
data.calibration %>%
  group_by(gender_age_rec) %>%
  summarise(calibrated.sample.pop = sum(raked.weight) %>% round(1)),
age.gender.eurostat)
## # A tibble: 6 x 3
##   gender_age_rec calibrated.sample.pop  Freq
##            <chr>                 <dbl> <dbl>
## 1 Female_(14,34]                 351.7   352
## 2 Female_(34,54]                 371.7   372
## 3 Female_(54,99]                 416.6   417
## 4   Male_(14,34]                 357.7   358
## 5   Male_(34,54]                 363.7   364
## 6   Male_(54,99]                 363.7   364

Here we show the compared frequencies for regions. Both comparisons allow us to see that calibration performed as expected and weighted frequencies reflect population proportions from official statistics.

left_join(
data.calibration %>%
  group_by(region) %>%
  summarise(calibrated.sample.pop = sum(raked.weight) %>% round(1)),
region.eurostat)
## # A tibble: 12 x 3
##                      region calibrated.sample.pop  Freq
##                       <chr>                 <dbl> <dbl>
##  1     North East (England)                    85    85
##  2     North West (England)                   235   235
##  3 Yorkshire and the Humber                   189   189
##  4  East Midlands (England)                   169   169
##  5  West Midlands (England)                   194   194
##  6          East of England                   220   220
##  7                   London                   286   286
##  8     South East (England)                   312   312
##  9     South West (England)                   183   183
## 10                    Wales                   114   114
## 11                 Scotland                   176   176
## 12         Northern Ireland                    62    62

Now we can compare our computed final calibrated weights with those computed by the experts in charge of the weighting procedure of the 7th ESS. We will first join the calibrated weights with our original dataset of respondents.

data %<>%
  left_join(data.calibration %>% select(idno, raked.weight, ess.raked.weight), by = "idno")

rm(data.calibration, age.gender.eurostat, region.eurostat)

Here we compare our raked weight using the same methodology as they used in the 7th ESS (‘ess.raked.weight’) and the weight included in the 7th ESS dataset (‘pspwght’). Looking at the first 15 observations we see that they are, in most cases, very close. This means that we successfully reproduced the weights computed in the 7th ESS!

For our estimations, we could also use the calibrated/raked weights we computed ourselves using the additional input of non-response weights (variable ‘raked.weight’). Using this input, our raking procedure would also ‘borrow’ good estimation properties from the non-response adjustment9.

left_join(
data %>% select(idno, raked.weight, ess.raked.weight),
original.weights %>% select(idno, pspwght)
) %>% head(15)
##         idno raked.weight ess.raked.weight   pspwght
## 1  100000003    0.6033962        0.4567495 0.4674183
## 2  100000005    0.3324131        0.3403316 0.3895551
## 3  100000008    0.4118283        0.4077717 0.4023832
## 4  100000009    1.7442021        2.1873476 2.1463098
## 5  100000010    1.0769384        1.0082670 1.0230092
## 6  100000012    1.0538387        1.2975135 1.3370290
## 7  100000015    1.5882982        1.6647140 1.6514970
## 8  100000016    0.5202722        0.5131737 0.4886227
## 9  100000017    2.0233932        1.5814927 1.5864217
## 10 100000020    0.9267053        0.8367614 0.8877654
## 11 100000022    0.8641030        1.0493359 1.0878961
## 12 100000023    2.3115622        2.3722391 2.3796326
## 13 100000025    0.5406270        0.4424130 0.4768728
## 14 100000030    0.9304049        1.0735789 1.0272274
## 15 100000031    1.2475568        0.9753477 0.9355472

Here we try to compute a mesure of distance of our computed raked weights to those included in the original 7th ESS datafile. We can do this by calculating the sum of squared differences. As we would expect, we find that those computed using the ESS methodology tend to fall much closer to the ESS original weights than the weights which also include input from non-response estimations. It is important to stress again that this does not mean that closer weights are better than those including properties from non-response estimations.

left_join(
data %>% select(idno, raked.weight, ess.raked.weight),
original.weights %>% select(idno, pspwght)
) %>%
  mutate(diff.myraked = pspwght-raked.weight,
         diff.essraked = pspwght-ess.raked.weight) %>%
  summarise(ssdiff.myraked = sum(diff.myraked ^ 2),
            ssdiff.essraked = sum(diff.essraked ^ 2))
##   ssdiff.myraked ssdiff.essraked
## 1        107.686        49.53199

  1. There is a small problem with 20 respondents for which we do not have information about their age. In order to keep thing simple this example we will ignore this issue.

  2. Eurostat data corresponds to tables and which give information on population by age and gender (‘demo_pjangroup’) and age and NUTS2 region (‘demo_r_d2jan’) on the 1st of January of 2014. Last updated on the 29th of April 2017.

  3. It has its own requierements in terms of coding ‘grammar’. This is may be because it was programmed in 2003, almost 15 years ago! Hopefully other packages such as ‘srvyr’, which at the time of writing this guide is still in version 0.2.1, will be able to take the content of the ‘survey’ package and adapt it to current scripting

  4. This idea of calibration weights ‘borrowing’ properties from input weights comes from Valliant et al. (2013, pag 231)