# 26 Surveys

Chandon, Morwitz, and Reinartz (2005) posits that “self-generated validity” can obscure the link between purchase intentions and purchase behavior. On average, the link between latent intentions and purchase behavior is 58% stronger among survey consumers that that of nonsurveyed consumers.

Sheppard, Hartwick, and Warshaw (1988) found that the link between intentions and behavior is about 0.53

A little more improvement can be achieved in predictive power if we use segmentation before forecasting sales based on historical purchases and purchase intention Morwitz and Schmittlein (1992)

SICSS Summer 2020 -Survey Research in the Digital Age by professor Matthew Salganik
Sampling Interviews Data environment
1st era area probability face-to-face stand-alone
2nd ear random digital dial probability telephone stand-alone

Total survey error framework

Insight:

1. Errors can come from bias or variance
2. Total survey error = Measurement error + representation error

$@Groves_2010, Fig. 3$

Probability and Non-probability Sampling

• Probability sample: every unit from a frame population has a known and non-zero probability of inclusion
• With weighting, we can recover bias in your sampling.
• Non-response problem

Horvitz-Thompson estimator (or bias estimator):

$\hat{\bar{y}} = \frac{\sum_{i \in s}y_i / \pi_i}{N}$

where $$\pi_i$$ = person i’s probability of inclusion (we have to estimate)

Wiki Survey

• Create a survey that leverages the power of people

Mass Collaboration

• Human Computation: Train People -> Train Lots of People -> Train Machine

• Cleaning

• De-biasing

• Combining

• Open Call:

• solutions are easier to check than generate

• required specialized skills

• Distributed Data Collection:

• People go out and collect data

• quality check

Fragile family challenges

## 26.1 Anchoring Vignettes

• Problem of interpersonal incomparability

• Resources:

• Help with 2 questions:

• Different respondents understand the same question differently: Incomparaability in Survey Responses (“DIF”). Agreement on theoretical concept is almost nearly impossible.

• How can we measure concepts that can only be defined by examples

• Measure like usually, then subtract the incomparable portion. (i.e., using the assessment from the same respondents for a particular example/case to correct/adjust for the self-assessment).

• Varying vignette assessments give us DIF (i.e., differential item functioning)

• Since we created the anchors (i.e., examples), we know the true vignette assessments are fixed over respondents

### 26.1.1 Nonparametric method

Code the relative ranking of self-assessment in accordance to vignettes.

Inconsistencies would be considered ties.

Measurement Assumptions:

• Response consistency: Each responder approaches the self-assessment and vignette categories in a same manner across questions.

• Vignette Equivalence: For every vignette, the real level is the same for all respondents.

Used Ordered Probit to estimate.

### 26.1.2 Parametric method

The more vignettes that we have better identification. But it will introduce measurement errors.

• Also use an ordinal probit model

• with varying thresholds and a random effect.

# install.packages("anchors")
library(anchors)
## Warning: package 'anchors' was built under R version 4.0.5
## Loading required package: rgenoud
## Warning: package 'rgenoud' was built under R version 4.0.5
## ##  rgenoud (Version 5.8-3.0, Build Date: 2019-01-22)
## ##  See http://sekhon.berkeley.edu/rgenoud for additional documentation.
## ##  Please cite software as:
## ##   Walter Mebane, Jr. and Jasjeet S. Sekhon. 2011.
## ##   Genetic Optimization Using Derivatives: The rgenoud package for R.''
## ##   Journal of Statistical Software, 42(11): 1-26.
## ##
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 4.0.5
##
## ##  anchors (Version 3.0-8, Build Date: 2014-02-24)
## ##  See http://wand.stanford.edu/anchors for additional documentation and support.
# Example from the package's authors
data("freedom")
head(freedom)
##        sex age educ  country self vign1 vign2 vign3 vign4 vign5 vign6
## 109276   0  20    4  Eurasia    1     4     3     3     5     3     4
## 25117    1  55    5  Oceania    2     3     3     4     4     4     5
## 106877   0  27    2 Eastasia    1     2     1     4     4     5     5
## 69437    1  30    1 Eastasia    1     2     2     4     5     5     5
## 88178    1  25    4  Oceania    2     3     3     5     5     5     5
## 111063   1  56    2 Eastasia    2     3     2     4     5     5     4
a1 <-
anchors(self ~ vign2 + vign3 + vign4 + vign5 + vign6, freedom, method = "C")
summary(a1)
##
## ANCHORS: SUMMARY OF RELATIVE RANK ANALYSIS:
##
## Overview of C-ranks
##
## Number of cases: 1763 with interval value, 1737 with scalar value
##
## Maximum possible C-rank value: 11
##
## Interval on C-scale: Frequency and proportions Cs to Ce
##            N  Prop MinEnt
##  1 to  1 387 0.111      1
##  2 to  2 279 0.080      2
##  3 to  3 336 0.096      3
##  4 to  4  81 0.023      4
##  5 to  5  59 0.017      5
##  6 to  6  28 0.008      6
##  7 to  7  11 0.003      7
##  8 to  8  31 0.009      8
##  9 to  9  22 0.006      9
## 10 to 10 164 0.047     10
## 11 to 11 339 0.097     11
##  1 to  4  16 0.005      1
##  1 to  5  12 0.003      1
##  1 to  6  25 0.007      6
##  1 to  7   5 0.001      6
##  1 to  8  31 0.009      6
##  1 to  9   5 0.001      6
##  1 to 10  32 0.009      6
##  1 to 11  19 0.005      6
##  2 to  4  15 0.004      3
##  2 to  5  11 0.003      3
##  2 to  6  22 0.006      6
##  2 to  7   4 0.001      6
##  2 to  8  51 0.015      6
##  2 to  9  19 0.005      6
##  2 to 10 177 0.051      6
##  2 to 11  91 0.026      6
##  3 to  6  31 0.009      6
##  3 to  7   3 0.001      6
##  3 to  8  93 0.027      6
##  3 to  9  29 0.008      6
##  3 to 10  16 0.005      6
##  3 to 11  11 0.003      6
##  4 to  6  16 0.005      6
##  4 to  7   2 0.001      6
##  4 to  8  94 0.027      6
##  4 to  9  39 0.011      6
##  4 to 10 175 0.050      6
##  4 to 11  39 0.011      6
##  5 to  8  80 0.023      6
##  5 to  9  38 0.011      6
##  5 to 10   9 0.003      6
##  5 to 11   6 0.002      6
##  6 to  8 107 0.031      6
##  6 to  9  61 0.017      6
##  6 to 10 242 0.069      6
##  6 to 11  52 0.015      6
##  7 to 10   1 0.000     10
##  7 to 11   1 0.000     11
##  8 to 10  44 0.013     10
##  8 to 11  39 0.011     11
##
## Note: MinEnt is the rank for the interval that minimizes entropy
##
## Summary of C-ranks with ties/intervals broken:
##
## Distribution of ranks omiting interval cases
##      1     2     3     4     5     6     7     8     9    10    11
##  0.223 0.161 0.193 0.047 0.034 0.016 0.006 0.018 0.013 0.094 0.195
##
## Distribution of ranks allocating interval cases uniformly
##      1   2     3    4    5    6     7     8    9   10    11
##  0.116 0.1 0.125 0.07 0.07 0.09 0.079 0.091 0.06 0.09 0.107
##
## Distribution of ranks allocating interval cases via cpolr
## and conditioning on observed ranks
##     1     2     3     4     5     6     7     8     9    10    11
## 0.118 0.103 0.142 0.051 0.045 0.138 0.025 0.155 0.017 0.095 0.110
##
## Allocating cases to their MinEnt values produces
##     1     2     3     4     5     6     7     8     9    10    11
## 0.119 0.080 0.103 0.023 0.017 0.472 0.003 0.009 0.006 0.060 0.108