Chapter 5 Jun 13–19: Unsupervised machine learning introduction

This week, our goals are to…

  1. Articulate a research question that addresses an unmet need in HPEd and can be answered by predictive analytics (PA) methods.

  2. Recognize uses of unsupervised machine learning (ML) techniques.

  3. Interpret results of clustering and PCA analyses.

5.1 Unsupervised compared to supervised machine learning

In general, in our course, we will only encounter two types of machine learning:

  • Unsupervised machine learning (UML): For our purposes, this involves using the computer to explore our data without a specific outcome of interest in mind. This is a type of data analysis without a specific dependent variable of interest that we are trying to predict. It can be used to identify groups—also called clusters—within data (which is the main way in which we will use it in this class). Taking a dataset of 100 students and idenfitying groups of students who are similar to each other is an example of UML.

  • Supervised machine learning (SML): This involves using the computer to make a prediction about the future of a specific outcome of interest for specific people or units in our data. This is the type of machine learning that we have and will spend most of our time on. Using data from last year’s students to make predictions about the final exam grades of this year’s students (before this year’s students have taken the final exam) is an example of SML.

To learn more about UML and SML, please read this article:

Now that you have read a bit about the differences between UML and SML, continue reading below to see how some UML methods work.

5.2 Unsupervised machine learning basics

Take note of the following list of situations in which you can use UML:

  1. Identifying groups or clusters within your data.20
  2. Exploratory analysis before you have measured/collected your dependent variable.
  3. Comparing exploratory results to supervised learning results.
  4. Dimensionality reduction (turning many variables into fewer variables).

The list above is not exhaustive. There could be many additional uses as well!

Please watch the following videos about unsupervised machine learning:

Great Learning. Learn Cluster Analysis | Cluster Analysis Tutorial | Introduction to Cluster Analysis. https://youtu.be/3MnVCX94jJM. Watch at the YouTube link or embedded below.

Josh Starmer. StatQuest: K-means clustering. https://youtu.be/4b5d3muPQmA. Watch at the YouTube link or embedded below.

Please also read the following article:

We will now turn our attention to some examples in which UML has been applied.

5.3 Scholarship involving unsupervised machine learning

Below, we will see some examples in which UML has been used to identify groups within datasets.

Please read the following articles:

  • Akasaki M, Ploubidis GB, Dodgeon B, Bonell CP. The clustering of risk behaviours in adolescence and health consequences in middle age. Journal of adolescence. 2019 Dec 1;77:188-97. https://doi.org/10.1016/j.adolescence.2019.11.003.

  • Akçapýnar G, Altun A, Cosgun E. Investigating students’ interaction profile in an online learning environment with clustering. In 2014 IEEE 14th International Conference on Advanced Learning Technologies 2014 Jul 7 (pp. 109-111). IEEE. https://doi.org/10.1109/ICALT.2014.40.

This article is optional (not required):

  • Escolar-Jimenez, C., Matsuzaki, K., Okada, K., & Gustilo, R. (2019). Enhancing organizational performance through employee training and development using k-means cluster analysis. International Journal of Advanced Trends in Computer Science and Engineering, 8 (4), 1576-1582. https://doi.org/10.30534/ijatcse/2019/82842019.

5.4 Assignment

In this week’s assignment, you will respond to the unsupervised machine learning (UML) examples that you read about earlier in this chapter. You will also brainstorm about possible plans for your final project in this class.

5.4.1 Discussion post, part A – Response to readings

Below, you will prepare a new discussion post for the discussion board in D2L. The first part of your discussion post relates to some of the articles you read.

Task 1: Identify the research questions in the Akasaki et al (2019) and Akçapınar et al (2014) articles.

Task 2: Summarize the most important conclusions of the Akasaki et al (2019) and Akçapınar et al (2014) articles.

Task 3: Give each study a letter grade. How clear are they? How well do they answer their own RQs? How acceptable/unacceptable are their limitations? How useful (or not) are their contributions to scholarship on their respective RQs?

Task 4: Would supervised learning methods have been better to answer any of these two studies’ RQs?

Task 5: Write any questions you have.

5.4.2 Discussion post, part B – Final project brainstorming

Now you will continue writing your discussion post, turning your attention to your plans for your final project in this class.

Task 6: Earlier in the course, you were asked to think about possible uses of predictive analytics (PA) methods in your own work. Now, please identify a research question that PA methods could help you answer. This question could be the basis of your final project, if you want.

Task 7: To answer the research question you wrote, what data would you need? How would this data be generated or collected?

Task 8: How would unsupervised machine learning techniques help answer or start to answer this question? Which techniques would be most useful and why?

You have reached the end of this week’s assignment. Please be sure to submit your responses to all tasks to the appropriate places (you can submit parts A and B above in a single discussion post).


  1. This is the main way we will use UML in this class.↩︎