8.9 Unsupervised machine learning techniques

  • (Molina and Garip 2019, 36)
  • Principal component analysis:discovers a small number of linear combinations of the inputs that are uncor-related with one another and capture most of the variability in the data. These linear combinations (principalcomponents) can be used as inputs in subsequent analysis (e.g., in regression to predict some output)
  • Factor analysis:discovers latent (unobserved) factors that account for the correlation in inputs; returns factorloadings for each input that can be used to interpret the factors
  • Cluster analysis:groups observations into a given number of clusters so that observations in a cluster aremore similar to one another than to observations in other clusters; returns cluster membership for eachobservation
  • Latent class analysis:discovers latent classes of observations that can account for the correlations in observedcategorical inputs; returns probability of class membership for each observation
  • Sequence analysis:compares sequences (ordered elements or events) with optimal matching to discovergroups of observations with similar patterns (typically with cluster analysis)Topic modeling:discovers latent topics in text data based on co-occurrence of words across documents
  • Community detection:identifies communities in networks (graphs) based on structural position of nodes

References

Molina, Mario, and Filiz Garip. 2019. “Machine Learning for Sociology.” Annu. Rev. Sociol., July.