8.1 Data

8.1.1 Import

We will analyze data from a survey in which 15 consumers were asked to rate ten ice creams. Each ice cream had a different ‘profile’, i.e., a different combination of levels of four attributes: flavor (raspberry, chocolate, strawberry, mango, vanilla), packaging (homemade waffle, cone, pint), light (low fat or not), and organic (organic or not). All 15 respondents rated the ten profiles by providing a score between 1 and 10.

We use data provided by www.xlstat.com that are described in their tutorial on doing conjoint analysis in Excel. Download the data here.

library(tidyverse)
library(readxl)

icecream <- read_excel("icecream.xlsx") # No need to include the sheet argument when there's only one sheet in the Excel file

8.1.2 Manipulate

icecream
## # A tibble: 10 x 20
##    Observations Flavor Packaging Light Organic `Individual 1`
##    <chr>        <chr>  <chr>     <chr> <chr>            <dbl>
##  1 Profile 1    Raspb~ Homemade~ No l~ Not or~              1
##  2 Profile 2    Choco~ Cone      No l~ Organic              4
##  3 Profile 3    Raspb~ Pint      Low ~ Organic              2
##  4 Profile 4    Straw~ Pint      No l~ Organic              7
##  5 Profile 5    Straw~ Cone      Low ~ Not or~              9
##  6 Profile 6    Choco~ Homemade~ No l~ Not or~              3
##  7 Profile 7    Vanil~ Pint      Low ~ Not or~              5
##  8 Profile 8    Mango  Homemade~ Low ~ Organic             10
##  9 Profile 9    Mango  Pint      No l~ Not or~              6
## 10 Profile 10   Vanil~ Homemade~ No l~ Organic              8
## # ... with 14 more variables: `Individual 2` <dbl>, `Individual 3` <dbl>,
## #   `Individual 4` <dbl>, `Individual 5` <dbl>, `Individual 6` <dbl>,
## #   `Individual 7` <dbl>, `Individual 8` <dbl>, `Individual 9` <dbl>,
## #   `Individual 10` <dbl>, `Individual 11` <dbl>, `Individual 12` <dbl>,
## #   `Individual 13` <dbl>, `Individual 14` <dbl>, `Individual 15` <dbl>

When we inspect the data, we see that we have a column for every respondent. This is an unusual way of storing data (normally we have one row per respondent), so let’s restructure our dataset with the gather function (as we’ve done before):

icecream <- icecream %>% 
  gather(respondent, rating, starts_with("Individual")) %>% # respondent keeps track of the respondent, rating will store the respondent's ratings, and we want to stack every variable that starts with Individual
  rename("profile" = "Observations") %>% # rename Observations to profile
  mutate(profile = factor(profile), respondent = factor(respondent),  # factorize identifiers
         Flavor = factor(Flavor), Packaging = factor(Packaging), Light = factor(Light), Organic = factor(Organic)) # factorize the ice cream attributes


# Wide dataset: one row per unit of observation (here: profile) and a number of columns for the different observations (here: respondents)
# Long dataset: one row per observation (here: profile x respondent combination)

# Converting from wide to long means that we're stacking a number of columns on top of each other. For this, we need an extra variable to keep track of which column we are dealing with.
# The gather function converts datasets from wide to long.
# The first argument (respondent) will tell us which column we are dealing with. This is the variable that will store the names of the columns that we are stacking.
# The second argument (rating) will store the actual columns stacked on top of each other.
# The following arguments (all variables with names that start with Individual) are the columns that we want to stack.

icecream
## # A tibble: 150 x 7
##    profile   Flavor    Packaging     Light    Organic   respondent  rating
##    <fct>     <fct>     <fct>         <fct>    <fct>     <fct>        <dbl>
##  1 Profile 1 Raspberry Homemade waf~ No low ~ Not orga~ Individual~      1
##  2 Profile 2 Chocolate Cone          No low ~ Organic   Individual~      4
##  3 Profile 3 Raspberry Pint          Low fat  Organic   Individual~      2
##  4 Profile 4 Strawber~ Pint          No low ~ Organic   Individual~      7
##  5 Profile 5 Strawber~ Cone          Low fat  Not orga~ Individual~      9
##  6 Profile 6 Chocolate Homemade waf~ No low ~ Not orga~ Individual~      3
##  7 Profile 7 Vanilla   Pint          Low fat  Not orga~ Individual~      5
##  8 Profile 8 Mango     Homemade waf~ Low fat  Organic   Individual~     10
##  9 Profile 9 Mango     Pint          No low ~ Not orga~ Individual~      6
## 10 Profile ~ Vanilla   Homemade waf~ No low ~ Organic   Individual~      8
## # ... with 140 more rows

It’s better to use the Viewer here (double-click on the icecream object in the Environment pane or do View(icecream)) to see that there are ten rows (10 profiles) per respondent now.

The remaining variables are:

  • profile is an identifier for the different ice creams

  • Flavor, Packaging, Light, Organic are the four attributes that make up the profile of an ice cream

8.1.3 Recap: importing & manipulating

Here’s what we’ve done so far, in one orderly sequence of piped operations (download the data here:

library(tidyverse)
library(readxl)

icecream <- read_excel("icecream.xlsx") %>% 
  gather(respondent, rating, starts_with("Individual")) %>% # respondent keeps track of the respondent, rating will store the respondent's ratings, and we want to stack every variable that starts with Individual
  rename("profile" = "Observations") %>% 
  mutate(profile = factor(profile), respondent = factor(respondent),
         Flavor = factor(Flavor), Packaging = factor(Packaging), Light = factor(Light), Organic = factor(Organic))