8.1 Data

8.1.1 Import

We will analyze data from a survey in which 15 consumers were asked to rate ten ice creams. Each ice cream had a different ‘profile’, i.e., a different combination of levels of four attributes: flavor (raspberry, chocolate, strawberry, mango, vanilla), packaging (homemade waffle, cone, pint), light (low fat or not), and organic (organic or not). All 15 respondents rated the ten profiles by providing a score between 1 and 10.

We use data provided by www.xlstat.com that are described in their tutorial on doing conjoint analysis in Excel. Download the data here.

library(tidyverse)
library(readxl)

icecream <- read_excel("icecream.xlsx") # No need to include the sheet argument when there's only one sheet in the Excel file

8.1.2 Manipulate

icecream

## # A tibble: 10 x 20
##    Observations Flavor Packaging Light Organic `Individual 1` `Individual 2`
##    <chr>        <chr>  <chr>     <chr> <chr>            <dbl>          <dbl>
##  1 Profile 1    Raspb~ Homemade~ No l~ Not or~              1              6
##  2 Profile 2    Choco~ Cone      No l~ Organic              4              7
##  3 Profile 3    Raspb~ Pint      Low ~ Organic              2              1
##  4 Profile 4    Straw~ Pint      No l~ Organic              7              5
##  5 Profile 5    Straw~ Cone      Low ~ Not or~              9              8
##  6 Profile 6    Choco~ Homemade~ No l~ Not or~              3              2
##  7 Profile 7    Vanil~ Pint      Low ~ Not or~              5              9
##  8 Profile 8    Mango  Homemade~ Low ~ Organic             10             10
##  9 Profile 9    Mango  Pint      No l~ Not or~              6              4
## 10 Profile 10   Vanil~ Homemade~ No l~ Organic              8              3
## # ... with 13 more variables: `Individual 3` <dbl>, `Individual 4` <dbl>,
## #   `Individual 5` <dbl>, `Individual 6` <dbl>, `Individual 7` <dbl>,
## #   `Individual 8` <dbl>, `Individual 9` <dbl>, `Individual 10` <dbl>,
## #   `Individual 11` <dbl>, `Individual 12` <dbl>, `Individual 13` <dbl>,
## #   `Individual 14` <dbl>, `Individual 15` <dbl>

When we inspect the data, we see that we have a column for every respondent. This is an unusual way of storing data (normally we have one row per respondent), so let’s restructure our dataset with the pivot_longer function (as we’ve done before):

icecream <- icecream %>% 
  pivot_longer(cols = starts_with("Individual"), names_to = "respondent", values_to = "rating") %>% # respondent keeps track of the respondent, rating will store the respondent's ratings, and we want to stack every variable that starts with Individual
  rename("profile" = "Observations") %>% # rename Observations to profile
  mutate(profile = factor(profile), respondent = factor(respondent),  # factorize identifiers
         Flavor = factor(Flavor), Packaging = factor(Packaging), Light = factor(Light), Organic = factor(Organic)) # factorize the ice cream attributes


# Wide dataset: one row per unit of observation (here: profile) and a number of columns for the different observations (here: respondents)
# Long dataset: one row per observation (here: profile x respondent combination)

# Converting from wide to long means that we're stacking a number of columns on top of each other.
# The pivot_longer function converts datasets from wide to long and takes three arguments:
# 1. The cols argument: here we tell R which columns we want to stack. The original dataset had 10 rows with 15 columns for 15 individuals. The long dataset will have 150 rows with 150 values for 15 individuals. This means we need to keep track of which individual we're dealing with.
# 2. The names_to argument: here you define the name of the variable that keeps track of which individual we're dealing with.
# 3. The values_to argument: here you define the name of the variable that stores the actual values.

icecream

## # A tibble: 150 x 7
##    profile   Flavor    Packaging       Light      Organic    respondent   rating
##    <fct>     <fct>     <fct>           <fct>      <fct>      <fct>         <dbl>
##  1 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 1      1
##  2 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 2      6
##  3 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 3      5
##  4 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 4      1
##  5 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 5      2
##  6 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 6      7
##  7 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 7      7
##  8 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 8      5
##  9 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual 9      1
## 10 Profile 1 Raspberry Homemade waffle No low fat Not organ~ Individual ~     10
## # ... with 140 more rows

It’s better to use the Viewer here (double-click on the icecream object in the Environment pane or do View(icecream)) to see that there are ten rows (10 profiles) per respondent now.

The remaining variables are:

profile is an identifier for the different ice creams
Flavor, Packaging, Light, Organic are the four attributes that make up the profile of an ice cream

8.1.3 Recap: importing & manipulating

Here’s what we’ve done so far, in one orderly sequence of piped operations (download the data here:

library(tidyverse)
library(readxl)

icecream <- read_excel("icecream.xlsx") %>% 
  gather(respondent, rating, starts_with("Individual")) %>% # respondent keeps track of the respondent, rating will store the respondent's ratings, and we want to stack every variable that starts with Individual
  rename("profile" = "Observations") %>% 
  mutate(profile = factor(profile), respondent = factor(respondent),
         Flavor = factor(Flavor), Packaging = factor(Packaging), Light = factor(Light), Organic = factor(Organic))