Chapter 7 Best-Worst Scaling
Best–worst scaling (BWS) is a method for measuring preferences among pre-defined items.6 BWS presents subsets of the items for evaluation, asking the respondent to identify the best and worst (or most important and least important) from the subset. BWS studies usually use a balanced incomplete block design (BIBD) to construct choice sets.
BIBD is a category of designs where trt
treatments (items) are presented to the survey respondent in b
blocks (survey questions) of k
treatments (items). For example, seven items might be presented in seven survey questions with four of the items show in each question.
<- c("origin", "variety", "price", "taste", "safety", "washfree", "milling")
items
set.seed(8041)
<- find.BIB(
bibd trt = length(items),
b = length(items),
k = 4
)
The documentation for isGYD()
explains design balance. A balanced block design has three qualities: i) Each treatment appears equally often. ii) Each treatment appears in each block either n or n+1 times (usually 0 or 1 times). iii) The concurrences of i and j are the same for all pairs of treatments (i, j). A design balanced with respect to both rows and columns is called a generalized Youden design (GYD). A design with less columns (rows) than treatments is incomplete with respect to rows (columns). That’s what you will have in a BWS. A design in which each treatment occurs once per row and column is a latin square. If each treatment occurs the same number of times per row and column, it is a generalized latin square.
isGYD(bibd)
##
## [1] The design is a balanced incomplete block design w.r.t. rows.
There are other design with a BIBD:
find.BIB(6, 10, 3) %>% isGYD() # 6 items shown in 10 blocks of 3
find.BIB(7, 7, 3) %>% isGYD() # 7 items shown in 7 blocks of 3
find.BIB(9, 12, 6) %>% isGYD() # 9 items shown in 12 blocks of 6
find.BIB(11, 11, 5) %>% isGYD() # 11 items shown in 11 blocks of 5
find.BIB(13, 13, 4) %>% isGYD() # 13 items shown in 13 blocks of 4
find.BIB(16, 16, 6) %>% isGYD() # 16 items shown in 16 blocks of 6
The resulting questionaire might look like this.
bws.questionnaire(bibd, design.type = 2, item.names = items)
##
## Q1
## Best Items Worst
## [ ] variety [ ]
## [ ] price [ ]
## [ ] taste [ ]
## [ ] milling [ ]
##
## Q2
## Best Items Worst
## [ ] origin [ ]
## [ ] variety [ ]
## [ ] price [ ]
## [ ] washfree [ ]
##
## Q3
## Best Items Worst
## [ ] origin [ ]
## [ ] variety [ ]
## [ ] safety [ ]
## [ ] milling [ ]
##
## Q4
## Best Items Worst
## [ ] variety [ ]
## [ ] taste [ ]
## [ ] safety [ ]
## [ ] washfree [ ]
##
## Q5
## Best Items Worst
## [ ] origin [ ]
## [ ] price [ ]
## [ ] taste [ ]
## [ ] safety [ ]
##
## Q6
## Best Items Worst
## [ ] price [ ]
## [ ] safety [ ]
## [ ] washfree [ ]
## [ ] milling [ ]
##
## Q7
## Best Items Worst
## [ ] origin [ ]
## [ ] taste [ ]
## [ ] washfree [ ]
## [ ] milling [ ]
Consider the following response data from a BIBD with 7 items shown in 7 blocks of 3. b1
is the item selected as best in question 1, w1
is the item selected as worst in question 1, etc. age
, hp
, and chem
are respondent covariates.
data("ricebws1", package = "support.BWS")
<- ricebws1
dat glimpse(dat)
## Rows: 90
## Columns: 18
## $ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…
## $ b1 <int> 4, 2, 3, 2, 3, 2, 2, 3, 2, 3, 2, 2, 2, 2, 3, 2, 2, 3, 2, 3, 4, 2,…
## $ w1 <int> 1, 4, 4, 1, 4, 4, 1, 4, 1, 4, 4, 4, 4, 4, 1, 1, 4, 1, 3, 1, 1, 1,…
## $ b2 <int> 1, 3, 3, 3, 2, 3, 3, 2, 4, 3, 3, 1, 3, 3, 2, 3, 3, 1, 1, 1, 2, 3,…
## $ w2 <int> 3, 4, 4, 4, 4, 4, 2, 4, 2, 4, 4, 4, 1, 4, 4, 4, 1, 4, 4, 4, 4, 4,…
## $ b3 <int> 1, 3, 1, 3, 2, 1, 4, 1, 4, 1, 2, 3, 3, 1, 2, 3, 3, 3, 1, 1, 4, 4,…
## $ w3 <int> 3, 4, 2, 2, 3, 2, 3, 4, 2, 4, 4, 2, 1, 4, 1, 2, 1, 2, 3, 2, 1, 2,…
## $ b4 <int> 3, 3, 2, 3, 1, 2, 1, 3, 4, 2, 2, 2, 4, 3, 2, 3, 3, 2, 1, 2, 2, 2,…
## $ w4 <int> 4, 4, 4, 4, 4, 4, 2, 4, 1, 4, 4, 4, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ b5 <int> 4, 2, 3, 2, 3, 2, 2, 4, 2, 3, 2, 4, 2, 2, 3, 2, 4, 3, 1, 1, 3, 2,…
## $ w5 <int> 3, 1, 1, 3, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 2, 4, 4, 1, 4,…
## $ b6 <int> 2, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1,…
## $ w6 <int> 3, 3, 3, 3, 3, 3, 4, 3, 2, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ b7 <int> 1, 2, 2, 1, 2, 2, 4, 2, 4, 2, 2, 2, 3, 2, 2, 2, 2, 1, 1, 2, 2, 4,…
## $ w7 <int> 3, 3, 3, 3, 3, 3, 2, 3, 1, 3, 4, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ age <int> 3, 1, 3, 1, 3, 1, 1, 3, 3, 2, 2, 2, 2, 3, 1, 2, 3, 1, 2, 2, 1, 2,…
## $ hp <int> 2, 2, 2, 3, 2, 3, 1, 2, 2, 2, 3, 3, 2, 3, 3, 2, 1, 2, 3, 3, 3, 2,…
## $ chem <int> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0,…
Convert each response into one row per possible best-worse pair. There are k(k-1)
possible pairs, in this case 4x3=12 pairs. bws.dataset()
lengthens the 90 rows to 90 x 7 questions x 12 possible pairs per question = 7,560 rows.
<- bws.dataset(
bws data = dat,
response.type = 1, # format of response variables: 1 = row number format
choice.sets = bibd,
design.type = 2, # BIBD
item.names = items,
id = "id", # respondent id variable
response = colnames(dat)[2:15], # response variables
model = "maxdiff" # type of dataset to create
)
# 90 respondents x 7 questions x 12 possible pairs per question
dim(bws)
## [1] 7560 19
Question 1 presented items [variety, price, taste, milling]. Respondent id
= 1 selected choice 4 (items[bibd[1, 4]]
= milling) for the Best (b1
) and choice 1 (items[bibd[1, 1]]
= variety) for Worst (w1
).
1, ] dat[
## id b1 w1 b2 w2 b3 w3 b4 w4 b5 w5 b6 w6 b7 w7 age hp chem
## 1 1 4 1 1 3 1 3 3 4 4 3 2 3 1 3 3 2 1
The converted data set is easier to translate. Column RES = TRUE
in row 10 indicates the pair the respondent selected. Best was item 7 (items[7]
= milling) and the Worst was item 2 (items[2]
= variety). You can also get that from the +1 and -1 indicators for milling
and variety
used for modeling.
%>% filter(id == 1, Q == 1) bws
## id Q PAIR BEST WORST RES.B RES.W RES origin variety price taste safety
## 1 1 1 1 2 3 7 2 FALSE 0 1 -1 0 0
## 2 1 1 2 2 4 7 2 FALSE 0 1 0 -1 0
## 3 1 1 3 2 7 7 2 FALSE 0 1 0 0 0
## 4 1 1 4 3 2 7 2 FALSE 0 -1 1 0 0
## 5 1 1 5 3 4 7 2 FALSE 0 0 1 -1 0
## 6 1 1 6 3 7 7 2 FALSE 0 0 1 0 0
## 7 1 1 7 4 2 7 2 FALSE 0 -1 0 1 0
## 8 1 1 8 4 3 7 2 FALSE 0 0 -1 1 0
## 9 1 1 9 4 7 7 2 FALSE 0 0 0 1 0
## 10 1 1 10 7 2 7 2 TRUE 0 -1 0 0 0
## 11 1 1 11 7 3 7 2 FALSE 0 0 -1 0 0
## 12 1 1 12 7 4 7 2 FALSE 0 0 0 -1 0
## washfree milling STR age hp chem
## 1 0 0 101 3 2 1
## 2 0 0 101 3 2 1
## 3 0 -1 101 3 2 1
## 4 0 0 101 3 2 1
## 5 0 0 101 3 2 1
## 6 0 -1 101 3 2 1
## 7 0 0 101 3 2 1
## 8 0 0 101 3 2 1
## 9 0 -1 101 3 2 1
## 10 0 1 101 3 2 1
## 11 0 1 101 3 2 1
## 12 0 1 101 3 2 1
7.1 Count Analysis
You can see below that respondent 1 selected item 1 (origin) Best three times, item 5 (safety) Best three times, and item 7 (milling) Best once. Respondent 1 selected items 2 (variety), 3 (price), 4 (taste), and 5 (safety) Worst once and item 6 (washfree) worst three times.
%>%
bws filter(id == 1, RES == TRUE) %>%
::tbl_cross(BEST, WORST) gtsummary
WORST | Total | |||||
---|---|---|---|---|---|---|
2 | 3 | 4 | 5 | 6 | ||
BEST | ||||||
1 | 0 | 1 | 0 | 1 | 1 | 3 |
5 | 0 | 0 | 1 | 0 | 2 | 3 |
7 | 1 | 0 | 0 | 0 | 0 | 1 |
Total | 1 | 1 | 1 | 1 | 3 | 7 |
bws.count()
calculates counts for (b)est, (w)orst, best-minus-worst (bw), and standardized bw (sbw = bw / number of levels) for each item.
<- bws.count(bws, cl = 2)
bws_count dim(bws_count)
## [1] 90 32
%>% filter(id == 1) %>% glimpse() bws_count
## Rows: 1
## Columns: 32
## $ id <dbl> 1
## $ b.origin <dbl> 3
## $ b.variety <dbl> 0
## $ b.price <dbl> 0
## $ b.taste <dbl> 0
## $ b.safety <dbl> 3
## $ b.washfree <dbl> 0
## $ b.milling <dbl> 1
## $ w.origin <dbl> 0
## $ w.variety <dbl> 1
## $ w.price <dbl> 1
## $ w.taste <dbl> 1
## $ w.safety <dbl> 1
## $ w.washfree <dbl> 3
## $ w.milling <dbl> 0
## $ bw.origin <dbl> 3
## $ bw.variety <dbl> -1
## $ bw.price <dbl> -1
## $ bw.taste <dbl> -1
## $ bw.safety <dbl> 2
## $ bw.washfree <dbl> -3
## $ bw.milling <dbl> 1
## $ sbw.origin <dbl> 0.75
## $ sbw.variety <dbl> -0.25
## $ sbw.price <dbl> -0.25
## $ sbw.taste <dbl> -0.25
## $ sbw.safety <dbl> 0.5
## $ sbw.washfree <dbl> -0.75
## $ sbw.milling <dbl> 0.25
## $ age <int> 3
## $ hp <int> 2
## $ chem <int> 1
plot()
shows the relationship between the level means and standard deviations. Price, taste, and safety are similarly important, but price has a higher standard deviation, meaning its importance varies.
plot(bws_count, score = "bw")
The column plot shows the item ranks.
%>%
bws_count select(id, starts_with("sbw")) %>%
pivot_longer(cols = -id) %>%
group_by(name) %>%
summarize(.groups = "drop", M = mean(value)) %>%
arrange(M) %>%
ggplot(aes(y = fct_inorder(name), x = M)) +
geom_col()
7.2 Model
Fit a conditional logit model. A simple model uses the dummy vars, excluding one (washfree) to avoid singularity. The last term “- 1” means that the model has no alternative-specific constants. Use dfidx()
to convert the data into a format appropriate for the model.
<- RES ~ origin + variety + price + taste + safety + milling - 1
fmla
<- dfidx(bws, idx = list(c("STR", "id"), "PAIR"), choice = "RES")
bws_dfidx
<- mlogit(formula = fmla, data = bws_dfidx)
mlogit_fit
summary(mlogit_fit)
##
## Call:
## mlogit(formula = RES ~ origin + variety + price + taste + safety +
## milling - 1, data = bws_dfidx, method = "nr")
##
## Frequencies of alternatives:choice
## 1 2 3 4 5 6 7 8
## 0.036508 0.095238 0.082540 0.119048 0.123810 0.122222 0.098413 0.053968
## 9 10 11 12
## 0.147619 0.053968 0.030159 0.036508
##
## nr method
## 5 iterations, 0h:0m:0s
## g'(-H)^-1g = 8.32E-06
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## origin 1.13096 0.11785 9.5969 < 2.2e-16 ***
## variety 1.10765 0.11615 9.5364 < 2.2e-16 ***
## price 2.01292 0.12565 16.0205 < 2.2e-16 ***
## taste 1.84700 0.12378 14.9213 < 2.2e-16 ***
## safety 2.07194 0.12622 16.4156 < 2.2e-16 ***
## milling 0.96028 0.11494 8.3546 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1318
bws.sp()
shows the shares of preference.
# Specify the name of the base since it isn't in model.
<- bws.sp(mlogit_fit, base = "washfree")) (bws_sp
## origin variety price taste safety milling washfree
## 0.09835520 0.09608950 0.23759085 0.20126594 0.25203440 0.08292247 0.03174165
Safety was most important and was 0.252 / 0.238
= 1.0607917 times as important as the second place price.
This model isn’t a great fit, unfortunately. You cannot pull the McFadden’s R-squared easily, but the calculation is straight-forward.
<- -90 * 7 * log(12) # log-likelihood at zero
ll0 <- as.numeric(mlogit_fit$logLik)
llb 1 - (llb/ll0) # McFadden's R-squared
## [1] 0.1580881
1 - ((llb-6)/ll0) # Adjusted McFadden's R-squared
## [1] 0.1542554
A possible improvement is the random parameters logit model.
<- RES ~ origin + variety + price + taste + safety + milling - 1 | 0
fmla_rp
<- mlogit(
mlogit_rp_fit
fmla_rp,
bws_dfidx,rpar = c(origin = "n", variety = "n", price = "n", taste = "n", safety = "n", milling = "n"),
R = 100,
halton = NA,
panel = TRUE
)
summary(mlogit_rp_fit)
##
## Call:
## mlogit(formula = RES ~ origin + variety + price + taste + safety +
## milling - 1 | 0, data = bws_dfidx, rpar = c(origin = "n",
## variety = "n", price = "n", taste = "n", safety = "n", milling = "n"),
## R = 100, halton = NA, panel = TRUE)
##
## Frequencies of alternatives:choice
## 1 2 3 4 5 6 7 8
## 0.036508 0.095238 0.082540 0.119048 0.123810 0.122222 0.098413 0.053968
## 9 10 11 12
## 0.147619 0.053968 0.030159 0.036508
##
## bfgs method
## 24 iterations, 0h:0m:14s
## g'(-H)^-1g = 9.32E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## origin 1.74079 0.15489 11.2386 < 2.2e-16 ***
## variety 1.78710 0.15324 11.6624 < 2.2e-16 ***
## price 3.98883 0.24281 16.4277 < 2.2e-16 ***
## taste 3.09265 0.19980 15.4787 < 2.2e-16 ***
## safety 3.62784 0.19816 18.3078 < 2.2e-16 ***
## milling 1.63816 0.16366 10.0093 < 2.2e-16 ***
## sd.origin 1.66733 0.17842 9.3451 < 2.2e-16 ***
## sd.variety 1.86579 0.17869 10.4414 < 2.2e-16 ***
## sd.price 2.68927 0.23760 11.3186 < 2.2e-16 ***
## sd.taste 2.00738 0.18984 10.5740 < 2.2e-16 ***
## sd.safety 1.95473 0.21113 9.2584 < 2.2e-16 ***
## sd.milling 1.60534 0.18671 8.5983 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1093.1
##
## random coefficients
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## origin -Inf 0.6161939 1.740794 1.740794 2.865394 Inf
## variety -Inf 0.5286426 1.787098 1.787098 3.045554 Inf
## price -Inf 2.1749395 3.988827 3.988827 5.802715 Inf
## taste -Inf 1.7386952 3.092652 3.092652 4.446608 Inf
## safety -Inf 2.3093963 3.627839 3.627839 4.946282 Inf
## milling -Inf 0.5553767 1.638164 1.638164 2.720951 Inf
McFadden’s R-squared increased substantially.
<- as.numeric(mlogit_rp_fit$logLik)
llb_rp 1 - (llb_rp/ll0) # McFadden's R-squared
## [1] 0.3017665
1 - ((llb_rp-6)/ll0) # Adjusted McFadden's R-squared
## [1] 0.2979339
# Specify the name of the base since it isn't in model.
<- bws.sp(mlogit_rp_fit, base = "washfree", coef = items[-6])) (bws_rp_sp
## origin variety price taste safety milling
## 0.043367459 0.045422777 0.410650578 0.167597772 0.286218165 0.039137418
## washfree
## 0.007605832
Now (surprisingly?), price is most important and was 0.411 / 0.286
= 1.4347467 times as important as the second place safety.
Notes are from http://lab.agr.hokudai.ac.jp/nmvr/03-bws1.html.↩︎