Chapter 18 K Means

Now that we have our data, let us proceed with the k-means analysis! First, we’ll want to set the seed of our analysis (setting our seed makes it possible to replicate the data in the future). Then, we’ll use the kmeans() function. kmeans() takes at least 2 arguments: the data and the number of clusters (centers).

The goal of a k-means analysis is to minimize intra-cluster variation while maximizing inter-cluster variation. In other words, you want the observations in a cluster to be more similar with one another than they are similar to observations in other clusters.

set.seed(381)

k3 <- kmeans(survey_data, centers = 3)
print(k3) #this will also print out the results per observation too

## K-means clustering with 3 clusters of sizes 400, 545, 61
## 
## Cluster means:
##   issue_econ issue_race issue_covid trumpapprove
## 1   3.822500   2.747500    2.650000     3.822500
## 2   3.552294   3.825688    3.858716     1.157798
## 3   3.491803   1.442623    2.737705     1.786885
## 
## Clustering vector:
##    1    2    3    4    6    7    8    9   10   11   12   13   14   16   18   19 
##    2    2    2    2    2    2    1    1    2    2    1    2    1    1    1    2 
##   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35   36 
##    1    2    1    2    2    2    2    1    2    2    1    1    1    3    2    1 
##   37   38   39   40   41   42   43   44   46   47   48   49   50   51   53   55 
##    1    2    1    2    1    2    2    1    1    2    2    2    1    2    2    1 
##   56   57   58   59   60   61   62   63   64   66   67   69   72   73   74   75 
##    2    2    2    1    2    2    1    2    1    2    2    2    2    1    2    2 
##   76   77   78   79   80   81   82   83   86   87   88   89   90   91   92   93 
##    3    3    2    1    3    2    2    2    2    1    1    2    1    1    1    1 
##   94   95   96   97   98   99  100  101  102  103  104  106  107  108  109  110 
##    2    2    1    1    2    1    1    1    1    2    2    1    2    1    1    1 
##  111  112  113  115  116  117  118  119  120  122  123  125  126  127  128  129 
##    2    1    2    3    3    2    2    1    2    2    1    2    2    1    2    2 
##  130  131  132  133  134  136  137  139  140  142  143  144  145  146  150  151 
##    2    1    2    2    2    1    1    1    1    2    2    2    1    1    1    2 
##  152  153  154  155  156  157  158  159  160  161  162  163  164  165  166  167 
##    1    1    2    2    2    1    1    1    2    2    1    1    2    1    1    2 
##  168  169  170  171  172  173  174  175  176  177  178  179  180  181  182  183 
##    2    1    2    3    1    1    1    2    2    1    1    1    2    1    2    2 
##  184  185  186  187  188  189  190  191  192  193  194  195  196  197  198  199 
##    2    2    3    2    1    1    1    1    1    2    1    2    2    2    2    2 
##  200  201  202  203  204  205  206  207  208  209  210  211  212  213  214  215 
##    1    1    1    2    1    2    1    2    3    3    2    1    1    1    1    2 
##  216  217  218  219  220  221  222  223  224  225  226  227  228  229  230  231 
##    2    2    2    2    1    2    2    1    2    2    2    1    2    2    2    1 
##  232  233  234  235  236  237  238  239  240  241  242  243  244  245  246  247 
##    2    2    2    2    1    2    1    2    2    2    1    2    2    1    2    2 
##  248  249  250  251  252  253  254  255  256  258  259  260  261  262  263  264 
##    1    2    1    1    2    2    2    1    2    2    2    2    1    1    2    1 
##  265  266  267  270  271  274  275  276  278  280  281  282  283  284  285  287 
##    1    1    2    2    2    2    1    1    2    3    2    1    2    2    2    2 
##  289  290  291  294  296  298  300  301  302  303  304  305  306  307  308  309 
##    2    1    1    3    3    2    2    2    2    1    2    2    1    2    1    2 
##  311  312  313  314  315  316  317  318  319  320  321  323  324  325  326  327 
##    1    2    1    2    1    1    2    2    2    1    1    2    2    2    2    2 
##  328  329  330  331  332  333  334  335  336  337  338  339  340  341  344  345 
##    2    3    2    2    2    1    2    2    1    1    2    1    2    1    1    2 
##  346  347  348  349  350  351  352  353  354  355  356  357  358  359  360  362 
##    2    2    3    2    2    3    1    2    2    3    2    1    1    2    1    1 
##  363  364  365  366  367  368  369  370  371  372  373  376  377  378  379  380 
##    2    2    2    1    1    1    1    3    2    2    1    2    1    1    2    2 
##  381  382  384  385  386  387  388  389  390  391  392  393  394  395  396  397 
##    2    2    2    1    1    1    3    1    1    1    1    1    2    2    1    2 
##  398  399  400  401  402  403  404  405  406  407  408  409  410  411  412  413 
##    2    2    2    2    1    3    1    2    1    1    2    2    1    2    1    1 
##  414  415  416  417  418  419  420  421  422  423  424  425  426  427  428  429 
##    3    2    1    1    2    2    2    2    1    1    2    2    1    1    1    1 
##  430  431  432  433  435  437  438  440  441  442  443  444  445  447  448  449 
##    2    2    1    2    1    1    1    1    1    1    3    1    2    2    2    1 
##  450  453  454  455  456  457  459  462  463  465  466  468  469  470  471  472 
##    3    1    1    2    1    2    1    2    2    2    1    1    2    2    1    1 
##  474  475  476  477  478  479  480  481  482  483  485  486  487  488  489  490 
##    3    1    1    2    2    1    1    2    1    1    2    1    1    2    2    1 
##  491  493  494  495  499  501  503  504  505  506  507  508  509  510  511  512 
##    2    1    1    1    2    2    2    1    2    1    1    2    2    2    2    2 
##  513  514  515  516  518  519  523  524  526  527  528  529  530  532  533  534 
##    2    3    2    2    1    1    2    1    2    1    2    1    2    1    2    2 
##  535  536  537  539  541  542  543  545  546  547  548  549  550  551  552  553 
##    2    1    2    3    2    2    1    1    3    1    3    2    2    2    1    1 
##  554  555  557  558  560  561  562  563  564  565  566  567  568  569  570  572 
##    2    2    2    1    2    2    1    3    1    3    1    1    2    2    2    1 
##  573  574  575  576  577  578  579  580  581  582  583  584  585  586  587  588 
##    3    2    2    1    2    2    1    1    2    1    1    1    1    1    1    2 
##  589  590  593  594  597  600  603  605  606  607  608  610  612  613  615  617 
##    2    1    1    2    2    1    1    2    1    1    2    2    2    3    2    3 
##  618  619  620  621  622  623  624  625  626  627  629  630  631  633  635  637 
##    2    1    2    1    2    1    1    1    2    2    2    2    1    3    1    2 
##  639  640  641  643  644  645  646  648  649  650  651  652  653  654  655  657 
##    3    1    2    2    2    2    2    1    2    2    2    1    1    1    2    3 
##  658  659  660  662  663  664  665  666  667  668  669  671  672  673  674  675 
##    2    2    1    1    1    3    2    1    2    1    2    2    2    2    1    1 
##  677  678  680  681  682  683  684  686  687  688  689  691  692  693  694  695 
##    3    1    3    2    1    3    1    1    1    2    2    1    2    2    1    2 
##  696  697  698  699  700  701  702  703  704  705  707  708  709  711  712  713 
##    2    1    2    2    2    2    2    2    3    1    2    2    2    1    2    2 
##  714  715  716  717  718  719  720  722  723  724  725  726  727  728  729  730 
##    2    2    2    2    3    1    1    2    1    1    2    2    2    2    2    3 
##  731  732  734  735  736  737  738  739  740  741  742  743  744  745  746  747 
##    2    2    1    1    2    2    1    2    2    2    1    1    1    2    1    2 
##  748  749  750  751  752  754  755  756  757  758  761  762  764  765  766  767 
##    1    2    2    1    2    2    1    2    2    2    2    1    1    2    2    2 
##  768  769  771  772  773  775  776  778  779  780  783  784  787  788  789  790 
##    2    3    2    2    2    3    2    1    2    1    1    2    2    2    1    2 
##  791  792  793  794  795  798  799  800  804  805  806  807  808  809  810  811 
##    1    2    1    2    1    1    2    2    2    1    1    2    2    2    1    1 
##  812  813  815  816  817  818  819  820  821  822  823  824  825  826  827  828 
##    2    1    1    2    2    2    2    2    2    2    1    1    2    2    2    2 
##  829  830  832  834  835  836  837  840  841  843  844  845  846  847  849  850 
##    2    2    2    1    1    1    2    2    2    2    1    1    2    2    2    2 
##  851  852  856  857  858  859  860  862  863  864  866  867  868  869  870  871 
##    3    2    1    2    1    2    1    2    2    2    2    2    2    2    2    2 
##  872  873  874  876  877  879  880  881  882  883  884  885  888  889  890  891 
##    3    2    1    2    1    2    1    1    1    2    2    2    2    2    2    2 
##  892  894  895  896  897  898  900  902  903  904  905  906  907  908  909  910 
##    2    2    2    1    1    2    2    2    1    1    2    1    1    1    2    1 
##  912  913  914  916  917  918  919  920  921  922  923  924  925  926  928  930 
##    2    1    1    1    2    2    2    1    2    1    1    1    1    1    2    3 
##  931  932  933  934  935  936  937  939  941  943  945  946  947  948  949  950 
##    2    2    3    2    2    2    3    2    1    1    2    3    1    1    1    2 
##  951  952  953  954  955  956  957  958  959  960  961  962  963  964  966  968 
##    2    1    1    1    1    2    2    1    2    1    2    2    1    2    2    2 
##  969  971  972  974  975  976  977  978  982  983  984  986  988  990  991  994 
##    2    2    2    2    2    1    2    2    3    1    1    2    1    1    2    2 
##  996  997  998 1000 1001 1002 1003 1004 1005 1007 1008 1009 1010 1011 1012 1014 
##    3    1    1    2    2    1    2    1    1    1    3    2    2    2    1    1 
## 1015 1018 1021 1022 1023 1024 1025 1026 1028 1029 1030 1031 1032 1033 1034 1038 
##    1    1    1    1    1    2    2    1    2    2    2    2    2    2    2    1 
## 1039 1040 1041 1042 1044 1045 1046 1047 1048 1049 1050 1051 1054 1055 1056 1057 
##    1    2    1    2    2    2    1    1    2    2    1    2    1    2    1    1 
## 1058 1059 1061 1063 1064 1065 1066 1068 1069 1070 1072 1075 1076 1077 1078 1079 
##    1    2    3    2    2    1    3    3    3    2    1    2    2    2    2    1 
## 1080 1081 1082 1083 1084 1085 1087 1088 1089 1090 1091 1093 1094 1095 1096 1097 
##    2    2    1    2    1    2    2    2    2    2    2    2    3    1    1    3 
## 1098 1099 1100 1103 1104 1106 1107 1108 1110 1111 1112 1113 1114 1115 1116 1118 
##    1    2    2    1    2    1    2    2    2    2    2    2    2    2    1    1 
## 1119 1120 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 
##    1    2    1    2    1    2    2    1    2    2    2    2    1    2    1    2 
## 1136 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1152 1153 
##    1    1    2    2    1    1    1    2    1    2    2    1    1    1    1    2 
## 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 
##    2    2    1    2    1    2    1    2    2    2    2    2    2    1    2    1 
## 1170 1171 1172 1173 1174 1175 1176 1177 1178 1180 1183 1185 1187 1188 1190 1191 
##    2    2    2    1    1    1    2    2    2    2    1    1    1    2    2    2 
## 1192 1193 1194 1195 1196 1198 1199 1201 1202 1203 1204 1205 1206 1207 
##    2    3    1    2    2    2    1    1    1    2    1    1    2    2 
## 
## Within cluster sum of squares by cluster:
## [1] 1049.2925  497.7505  214.3279
##  (between_SS / total_SS =  58.8 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

K-means is a single-cluster assignment, so each observation will be automatically assigned to one cluster. You can extract the cluster results (per observation) from the data by extracting the $cluster atomic.

Below, we extract the clusters per each observation and then attach it to the survey data.

survey_data$cluster <- k3$cluster

18.1 Visualization

Another useful thing to do is to visualize the results of the k-means analysis. To do so, we can use the fviz_cluster() function, which is from factoextra. Let us do this now with our 3-cluster analysis.

fviz_cluster(k3, data = survey_data,
             #palette = c("#2E9FDF", "#00AFBB", "#E7B800"), #change the colors of the clusters
             geom = "point", ggtheme = theme_minimal())

We could obviously do this with analyses including larger clusters. This can be especially useful to determine the optimal number of clusters.

k2 <- kmeans(survey_data, centers = 2, nstart = 25)
fviz_cluster(k2, data = survey_data,
             geom = "point", ggtheme = theme_minimal())

k4 <- kmeans(survey_data, centers = 4, nstart = 25)
fviz_cluster(k4, data = survey_data,
             geom = "point", ggtheme = theme_minimal())

k5 <- kmeans(survey_data, centers = 5, nstart = 25)
fviz_cluster(k5, data = survey_data,
             geom = "point", ggtheme = theme_minimal())

k9 <- kmeans(survey_data, centers = 9, nstart = 25)
fviz_cluster(k9, data = survey_data,
             geom = "point", ggtheme = theme_minimal())