7.1 Data

7.1.1 Import

We will analyze data from a survey in which 40 respondents were asked to rate the importance of a number of store attributes when buying equipment. Download the data here and import them into R:

7.1.2 Manipulate

## # A tibble: 40 x 10
##    respondent_id variety_of_choi~ electronics furniture quality_of_serv~
##            <dbl>            <dbl>       <dbl>     <dbl>            <dbl>
##  1             1                8           6         6                3
##  2             2                6           3         1                4
##  3             3                6           1         2                4
##  4             4                8           3         3                4
##  5             5                4           6         3                9
##  6             6                8           4         3                5
##  7             7                7           2         2                2
##  8             8                7           5         7                2
##  9             9                7           7         5                1
## 10            10                8           4         0                4
## # ... with 30 more rows, and 5 more variables: low_prices <dbl>,
## #   return_policy <dbl>, professional <dbl>, income <dbl>, age <dbl>

We have 10 columns or variables in our data:

  • respondent_id is an identifier for our observations

  • Respondents rated the importance of each of the following attributes on a 1-10 scale: variety_of_choice, electronics, furniture, quality_of_service, low_prices, return_policy.

  • professional: 1 for professionals, 0 for non-professionals

  • income: expressed in thousands of dollars

  • age

The cluster analysis will try to identify clusters with similar patterns of ratings. Linear discriminant analysis will then predict cluster membership based on segmentation variables (professional, income, and age).

As always, let’s factorize the variables that should be treated as categorical: