7.1 Data
7.1.1 Import
We will analyze data from a survey in which 40 respondents were asked to rate the importance of a number of store attributes when buying equipment. Download the data here and import them into R:
library(tidyverse)
library(readxl)
<- read_excel("segmentation_office.xlsx","SegmentationData") # Import the Excel file equipment
7.1.2 Manipulate
equipment
## # A tibble: 40 x 10
## respondent_id variety_of_choi~ electronics furniture quality_of_serv~
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 8 6 6 3
## 2 2 6 3 1 4
## 3 3 6 1 2 4
## 4 4 8 3 3 4
## 5 5 4 6 3 9
## 6 6 8 4 3 5
## 7 7 7 2 2 2
## 8 8 7 5 7 2
## 9 9 7 7 5 1
## 10 10 8 4 0 4
## # ... with 30 more rows, and 5 more variables: low_prices <dbl>,
## # return_policy <dbl>, professional <dbl>, income <dbl>, age <dbl>
# Check out the data
We have 10 columns or variables in our data:
respondent_id
is an identifier for our observationsRespondents rated the importance of each of the following attributes on a 1-10 scale:
variety_of_choice
,electronics
,furniture
,quality_of_service
,low_prices
,return_policy
.professional
: 1 for professionals, 0 for non-professionalsincome
: expressed in thousands of dollarsage
The cluster analysis will try to identify clusters with similar patterns of ratings. Linear discriminant analysis will then predict cluster membership based on segmentation variables (professional
, income
, and age
).
As always, let’s factorize the variables that should be treated as categorical:
<- equipment %>%
equipment mutate(respondent_id = factor(respondent_id),
professional = factor(professional, labels = c("non-professional","professional")))
7.1.3 Recap: importing & manipulating
Here’s what we’ve done so far, in one orderly sequence of piped operations (download the data here):
library(tidyverse)
library(readxl)
<- read_excel("segmentation_office.xlsx","SegmentationData") %>%
equipment mutate(respondent_id = factor(respondent_id),
professional = factor(professional, labels = c("non-professional","professional")))