Preface

Course description

Cover image

Supervised and unsupervised machine learning, also known as classification and clustering, are important statistical techniques commonly applied in many social and behavioral science research problems. Both seek to understand social phenomena through the identification of naturally occurring homogeneous groupings within a population. Supervised learning techniques are used to sort new observations into pre- existing or known groupings, while unsupervised learning techniques sort the population under study into natural, homogeneous groupings based on their observed characteristics. Both help to reveal hidden structure that may be used in further analyses. This course will compare and contrast these techniques, including many of their variations, with an emphasis on applications.

The course is an advanced masters level statistics course with substantial computing requirements. Prior coursework covering the General Linear Model, Maximum Likelihood Estimation, Probability, including multivariate distributions, as well as statistical computing is required. Note: we do not use specific course numbers as prerequisites in Albert, as there are many reasonable alternatives. However, APSTA-GE 2003 (2004 recommended), 2351, 2352 is a common pathway to success in this class. If you wish to take this class and believe that you have equivalent prerequisites, you must email me with the precise descriptions of the classes you have that meet these requirements before registering.

The book, and all its content, are Copyright © 2022 by New York University. All rights reserved.

Syllabus

  • Read this syllabus closely before the first class.