7.1 Data

7.1.1 Import

We will analyze data from a survey in which 40 respondents were asked to rate the importance of a number of store attributes when buying equipment. Download the data here and import them into R:

library(tidyverse) 
library(readxl)

equipment <- read_excel("segmentation_office.xlsx","SegmentationData") # Import the Excel file

7.1.2 Manipulate

equipment
## # A tibble: 40 x 10
##    respondent_id variety_of_choice electronics furniture quality_of_service
##            <dbl>             <dbl>       <dbl>     <dbl>              <dbl>
##  1             1                 8           6         6                  3
##  2             2                 6           3         1                  4
##  3             3                 6           1         2                  4
##  4             4                 8           3         3                  4
##  5             5                 4           6         3                  9
##  6             6                 8           4         3                  5
##  7             7                 7           2         2                  2
##  8             8                 7           5         7                  2
##  9             9                 7           7         5                  1
## 10            10                 8           4         0                  4
## # ... with 30 more rows, and 5 more variables: low_prices <dbl>,
## #   return_policy <dbl>, professional <dbl>, income <dbl>, age <dbl>
 # Check out the data

We have 10 columns or variables in our data:

  • respondent_id is an identifier for our observations

  • Respondents rated the importance of each of the following attributes on a 1-10 scale: variety_of_choice, electronics, furniture, quality_of_service, low_prices, return_policy.

  • professional: 1 for professionals, 0 for non-professionals

  • income: expressed in thousands of dollars

  • age

The cluster analysis will try to identify clusters with similar patterns of ratings. Linear discriminant analysis will then predict cluster membership based on segmentation variables (professional, income, and age).

As always, let’s factorize the variables that should be treated as categorical:

equipment <- equipment %>% 
  mutate(respondent_id = factor(respondent_id),
         professional = factor(professional, labels = c("non-professional","professional")))

7.1.3 Recap: importing & manipulating

Here’s what we’ve done so far, in one orderly sequence of piped operations (download the data here):

library(tidyverse) 
library(readxl)

equipment <- read_excel("segmentation_office.xlsx","SegmentationData") %>%
  mutate(respondent_id = factor(respondent_id),
         professional = factor(professional, labels = c("non-professional","professional")))