26 Surveys

Chandon, Morwitz, and Reinartz (2005) posits that “self-generated validity” can obscure the link between purchase intentions and purchase behavior. On average, the link between latent intentions and purchase behavior is 58% stronger among survey consumers that that of nonsurveyed consumers.

Sheppard, Hartwick, and Warshaw (1988) found that the link between intentions and behavior is about 0.53

A little more improvement can be achieved in predictive power if we use segmentation before forecasting sales based on historical purchases and purchase intention Morwitz and Schmittlein (1992)

SICSS Summer 2020 -Survey Research in the Digital Age by professor Matthew Salganik
Sampling Interviews Data environment
1st era area probability face-to-face stand-alone
2nd ear random digital dial probability telephone stand-alone
3rd era non-probability computer-administered linked

Total survey error framework (Groves and Lyberg 2010)

Insight:

  1. Errors can come from bias or variance
  2. Total survey error = Measurement error + representation error

\[@Groves_2010, Fig. 3\]


Probability and Non-probability Sampling

  • Probability sample: every unit from a frame population has a known and non-zero probability of inclusion
  • With weighting, we can recover bias in your sampling.
  • Non-response problem

Horvitz-Thompson estimator (or bias estimator):

\[ \hat{\bar{y}} = \frac{\sum_{i \in s}y_i / \pi_i}{N} \]

where \(\pi_i\) = person i’s probability of inclusion (we have to estimate)

Wiki Survey

  • Create a survey that leverages the power of people

Mass Collaboration

  • Human Computation: Train People -> Train Lots of People -> Train Machine

    • Cleaning

    • De-biasing

    • Combining

  • Open Call:

    • solutions are easier to check than generate

    • required specialized skills

  • Distributed Data Collection:

    • People go out and collect data

    • quality check

Fragile family challenges


26.1 Anchoring Vignettes

  • Problem of interpersonal incomparability

  • Resources:

  • Help with 2 questions:

    • Different respondents understand the same question differently: Incomparaability in Survey Responses (“DIF”). Agreement on theoretical concept is almost nearly impossible.

    • How can we measure concepts that can only be defined by examples

  • Measure like usually, then subtract the incomparable portion. (i.e., using the assessment from the same respondents for a particular example/case to correct/adjust for the self-assessment).

  • Varying vignette assessments give us DIF (i.e., differential item functioning)

  • Since we created the anchors (i.e., examples), we know the true vignette assessments are fixed over respondents


26.1.1 Nonparametric method

Code the relative ranking of self-assessment in accordance to vignettes.

Inconsistencies would be considered ties.

Measurement Assumptions:

  • Response consistency: Each responder approaches the self-assessment and vignette categories in a same manner across questions.

  • Vignette Equivalence: For every vignette, the real level is the same for all respondents.

Used Ordered Probit to estimate.

26.1.2 Parametric method

The more vignettes that we have better identification. But it will introduce measurement errors.

  • Also use an ordinal probit model

  • with varying thresholds and a random effect.

# install.packages("anchors")
library(anchors)
## Warning: package 'anchors' was built under R version 4.0.5
## Loading required package: rgenoud
## Warning: package 'rgenoud' was built under R version 4.0.5
## ##  rgenoud (Version 5.8-3.0, Build Date: 2019-01-22)
## ##  See http://sekhon.berkeley.edu/rgenoud for additional documentation.
## ##  Please cite software as:
## ##   Walter Mebane, Jr. and Jasjeet S. Sekhon. 2011.
## ##   ``Genetic Optimization Using Derivatives: The rgenoud package for R.''
## ##   Journal of Statistical Software, 42(11): 1-26. 
## ##
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 4.0.5
## 
## ##  anchors (Version 3.0-8, Build Date: 2014-02-24)
## ##  See http://wand.stanford.edu/anchors for additional documentation and support.
# Example from the package's authors
data("freedom")
head(freedom)
##        sex age educ  country self vign1 vign2 vign3 vign4 vign5 vign6
## 109276   0  20    4  Eurasia    1     4     3     3     5     3     4
## 25117    1  55    5  Oceania    2     3     3     4     4     4     5
## 106877   0  27    2 Eastasia    1     2     1     4     4     5     5
## 69437    1  30    1 Eastasia    1     2     2     4     5     5     5
## 88178    1  25    4  Oceania    2     3     3     5     5     5     5
## 111063   1  56    2 Eastasia    2     3     2     4     5     5     4
a1 <-
    anchors(self ~ vign2 + vign3 + vign4 + vign5 + vign6, freedom, method = "C")
summary(a1)
## 
## ANCHORS: SUMMARY OF RELATIVE RANK ANALYSIS:
## 
## Overview of C-ranks
## 
## Number of cases: 1763 with interval value, 1737 with scalar value
## 
## Maximum possible C-rank value: 11
## 
## Interval on C-scale: Frequency and proportions Cs to Ce
##            N  Prop MinEnt
##  1 to  1 387 0.111      1
##  2 to  2 279 0.080      2
##  3 to  3 336 0.096      3
##  4 to  4  81 0.023      4
##  5 to  5  59 0.017      5
##  6 to  6  28 0.008      6
##  7 to  7  11 0.003      7
##  8 to  8  31 0.009      8
##  9 to  9  22 0.006      9
## 10 to 10 164 0.047     10
## 11 to 11 339 0.097     11
##  1 to  4  16 0.005      1
##  1 to  5  12 0.003      1
##  1 to  6  25 0.007      6
##  1 to  7   5 0.001      6
##  1 to  8  31 0.009      6
##  1 to  9   5 0.001      6
##  1 to 10  32 0.009      6
##  1 to 11  19 0.005      6
##  2 to  4  15 0.004      3
##  2 to  5  11 0.003      3
##  2 to  6  22 0.006      6
##  2 to  7   4 0.001      6
##  2 to  8  51 0.015      6
##  2 to  9  19 0.005      6
##  2 to 10 177 0.051      6
##  2 to 11  91 0.026      6
##  3 to  6  31 0.009      6
##  3 to  7   3 0.001      6
##  3 to  8  93 0.027      6
##  3 to  9  29 0.008      6
##  3 to 10  16 0.005      6
##  3 to 11  11 0.003      6
##  4 to  6  16 0.005      6
##  4 to  7   2 0.001      6
##  4 to  8  94 0.027      6
##  4 to  9  39 0.011      6
##  4 to 10 175 0.050      6
##  4 to 11  39 0.011      6
##  5 to  8  80 0.023      6
##  5 to  9  38 0.011      6
##  5 to 10   9 0.003      6
##  5 to 11   6 0.002      6
##  6 to  8 107 0.031      6
##  6 to  9  61 0.017      6
##  6 to 10 242 0.069      6
##  6 to 11  52 0.015      6
##  7 to 10   1 0.000     10
##  7 to 11   1 0.000     11
##  8 to 10  44 0.013     10
##  8 to 11  39 0.011     11
## 
## Note: MinEnt is the rank for the interval that minimizes entropy
## 
## Summary of C-ranks with ties/intervals broken:
## 
## Distribution of ranks omiting interval cases
##      1     2     3     4     5     6     7     8     9    10    11
##  0.223 0.161 0.193 0.047 0.034 0.016 0.006 0.018 0.013 0.094 0.195
## 
## Distribution of ranks allocating interval cases uniformly
##      1   2     3    4    5    6     7     8    9   10    11
##  0.116 0.1 0.125 0.07 0.07 0.09 0.079 0.091 0.06 0.09 0.107
## 
## Distribution of ranks allocating interval cases via cpolr
## and conditioning on observed ranks
##     1     2     3     4     5     6     7     8     9    10    11 
## 0.118 0.103 0.142 0.051 0.045 0.138 0.025 0.155 0.017 0.095 0.110 
## 
## Allocating cases to their MinEnt values produces
##     1     2     3     4     5     6     7     8     9    10    11 
## 0.119 0.080 0.103 0.023 0.017 0.472 0.003 0.009 0.006 0.060 0.108