7.3 Hierarchical clustering

Then, we create a new data set that only includes the input variables, i.e., the ratings:

We can now proceed with hierarchical clustering to determine the optimal number of clusters:

The cluster analysis is stored in the hierarchical.clustering object and can easily be visualized by a dendogram:

From this dendogram, it seems that that we can split the observations in either two, three, or six groups of observations. Let’s carry out a formal test, the Duda-Hart stopping rule, to see how many clusters we should retain. For this, we need to (install and) load the NbClust package:

The Duda-Hart stopping rule table can be obtained as follows:

##      2      3      4      5      6      7      8      9 
## 0.2997 0.7389 0.7540 0.5820 0.4229 0.7534 0.5899 0.7036
##       2       3       4       5       6       7       8       9 
## 46.7352  5.6545  3.9145  4.3091  5.4591  3.2728  3.4757  2.9490

The conventional wisdom for deciding the number of groups based on the Duda-Hart stopping rule is to find one of the largest Duda values that corresponds to a low pseudo-T2 value. However, you can also request the optimal number of clusters as suggested by the stopping rule:

## Number_clusters     Value_Index 
##          3.0000          0.7389

In this case, the optimal number is three.