8.1 Data
8.1.1 Import
We will analyze data from a survey in which 15 consumers were asked to rate ten ice creams. Each ice cream had a different ‘profile,’ i.e., a different combination of levels of four attributes: flavor (raspberry, chocolate, strawberry, mango, vanilla), packaging (homemade waffle, cone, pint), light (low fat or not), and organic (organic or not). All 15 respondents rated the ten profiles by providing a score between 1 and 10.
We use data provided by www.xlstat.com that are described in their tutorial on doing conjoint analysis in Excel. Download the data here.
library(tidyverse)
library(readxl)
<- read_excel("icecream.xlsx") # No need to include the sheet argument when there's only one sheet in the Excel file icecream
8.1.2 Manipulate
icecream
## # A tibble: 10 x 20
## Observations Flavor Packaging Light Organic `Individual 1` `Individual 2`
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Profile 1 Raspb~ Homemade~ No l~ Not or~ 1 6
## 2 Profile 2 Choco~ Cone No l~ Organic 4 7
## 3 Profile 3 Raspb~ Pint Low ~ Organic 2 1
## 4 Profile 4 Straw~ Pint No l~ Organic 7 5
## 5 Profile 5 Straw~ Cone Low ~ Not or~ 9 8
## 6 Profile 6 Choco~ Homemade~ No l~ Not or~ 3 2
## 7 Profile 7 Vanil~ Pint Low ~ Not or~ 5 9
## 8 Profile 8 Mango Homemade~ Low ~ Organic 10 10
## 9 Profile 9 Mango Pint No l~ Not or~ 6 4
## 10 Profile 10 Vanil~ Homemade~ No l~ Organic 8 3
## # ... with 13 more variables: `Individual 3` <dbl>, `Individual 4` <dbl>,
## # `Individual 5` <dbl>, `Individual 6` <dbl>, `Individual 7` <dbl>,
## # `Individual 8` <dbl>, `Individual 9` <dbl>, `Individual 10` <dbl>,
## # `Individual 11` <dbl>, `Individual 12` <dbl>, `Individual 13` <dbl>,
## # `Individual 14` <dbl>, `Individual 15` <dbl>
When we inspect the data, we see that we have a column for every respondent. This is an unusual way of storing data (normally we have one row per respondent), so let’s restructure our dataset with the pivot_longer
function (as we’ve done before):
<- icecream %>%
icecream pivot_longer(cols = starts_with("Individual"), names_to = "respondent", values_to = "rating") %>% # respondent keeps track of the respondent, rating will store the respondent's ratings, and we want to stack every variable that starts with Individual
rename("profile" = "Observations") %>% # rename Observations to profile
mutate(profile = factor(profile), respondent = factor(respondent), # factorize identifiers
Flavor = factor(Flavor), Packaging = factor(Packaging), Light = factor(Light), Organic = factor(Organic)) # factorize the ice cream attributes
# Wide dataset: one row per unit of observation (here: profile) and a number of columns for the different observations (here: respondents)
# Long dataset: one row per observation (here: profile x respondent combination)
# Converting from wide to long means that we're stacking a number of columns on top of each other.
# The pivot_longer function converts datasets from wide to long and takes three arguments:
# 1. The cols argument: here we tell R which columns we want to stack. The original dataset had 10 rows with 15 columns for 15 individuals. The long dataset will have 150 rows with 150 values for 15 individuals. This means we need to keep track of which individual we're dealing with.
# 2. The names_to argument: here you define the name of the variable that keeps track of which individual we're dealing with.
# 3. The values_to argument: here you define the name of the variable that stores the actual values.
icecream
## # A tibble: 150 x 7
## profile Flavor Packaging Light Organic respondent rating
## <fct> <fct> <fct> <fct> <fct> <fct> <dbl>
## 1 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 1 1
## 2 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 2 6
## 3 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 3 5
## 4 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 4 1
## 5 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 5 2
## 6 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 6 7
## 7 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 7 7
## 8 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 8 5
## 9 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 9 1
## 10 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual ~ 10
## # ... with 140 more rows
It’s better to use the Viewer here (double-click on the icecream
object in the Environment pane or do View(icecream)
) to see that there are ten rows (10 profiles) per respondent now.
The remaining variables are:
profile
is an identifier for the different ice creamsFlavor
,Packaging
,Light
,Organic
are the four attributes that make up the profile of an ice cream
8.1.3 Recap: importing & manipulating
Here’s what we’ve done so far, in one orderly sequence of piped operations (download the data here:
library(tidyverse)
library(readxl)
<- read_excel("icecream.xlsx") %>%
icecream gather(respondent, rating, starts_with("Individual")) %>% # respondent keeps track of the respondent, rating will store the respondent's ratings, and we want to stack every variable that starts with Individual
rename("profile" = "Observations") %>%
mutate(profile = factor(profile), respondent = factor(respondent),
Flavor = factor(Flavor), Packaging = factor(Packaging), Light = factor(Light), Organic = factor(Organic))