Chapter 9 Jul 11–17: Classification methods and ethics

This week, our goals are to…

  1. View examples of research questions that can be answered with classification methods.

  2. View results from classification examples.

  3. Evaluate ethics of PA research.

  4. Continue to develop your final project plans.

We have already studied what classification means and what types of outcomes classification can help us predict. This week, we will learn about some of the specific methods used for classification and how they make predictions for us. We will also explore the ethics of applying machine learning to healthcare and educational contexts.

In this chapter, some content might mention metrics that we have not learned before, like F1, precision, and recall. Please ignore these and focus on metrics you do know, such as accuracy, sensitivity, and specificity.

9.1 K-nearest neighbors

One of the most basic and useful classification algorithms is k-nearest neighbors (KNN). Please watch the following videos to learn more about KNN:

Starmer, J. StatQuest: K-nearest neighbors, Clearly Explained. https://www.youtube.com/watch?v=HVXime0nQeI. Watch at the YouTube link or embedded below.

Warner, K. KNN Iris Dataset R Tutorial. https://www.youtube.com/watch?v=H0P-ZoTFhI0. Watch at the YouTube link or embedded below.

9.2 Decision tree and random forest (optional)

In this section, we will learn about two related machine learning algorithms: decision tree and random forest. Learning the content in this section is optional (not required) for students in HE-930. This section introduces the technical details of how decision tree and random forest work. You will be running these two algorithms in R later in the course.

The following video shows how decision tree works:

Normalized Nerd. Decision Tree Classification Clearly Explained! https://youtu.be/ZVR2Way4nwQ. Watch at the YouTube link or embedded below.

And this video shows how random forest works:

Normalized Nerd. Random Forest Algorithm Clearly Explained! https://youtu.be/v6VJ2RO66Ag. Watch at the YouTube link or embedded below.

For this class (HE-930), it is not required for you to learn how exactly how these two algorithms work. Nevertheless, you will learn how to run these algorithms in R and use them for both classification and regression in practice.

9.3 Scholarship involving classification

Now, we will see an example in which classification has been applied to an educational goal.

Please read the following article:

Black EW, Buchs SR, Garbas B. Using data mining for the early identification of struggling learners in physician assistant education. The Journal of Physician Assistant Education. 2021 Mar 1;32(1):38-42. https://doi.org/10.1097/JPA.0000000000000347.

This article is optional to read and contains a lot of useful information about how many common machine learning algorithms work (not required):

  • Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC medical informatics and decision making. 2019 Dec;19(1):1-6. https://doi.org/10.1186/s12911-019-1004-8.

9.4 Ethics of machine learning

In our study of predictive analytics and machine learning, we have spent a lot of time examining the situations in which machine learning might be useful and how to run the analytics and generate results. However, ethical considerations of how exactly machine learning should be applied (or not) are equally important.

Please read the following chapter related to ethics in machine learning:

  • Prinsloo, P & Slade, S. Chapter 4 (pp. 49–57): “Ethics and Learning Analytics: Charting the (Un)Charted” in Handbook of Learning Analytics. Information on how to access this reading is available in D2L.

There are many aspects of ethics in machine learning that are important to consider, both within and beyond the context of the reading above. Here are some of these considerations:

  • How do we balance a) the need for privacy and protection of the people whose data we are collecting with b) the benefits of collecting this data through surveillance or other means and then leveraging it to make predictions about the future?

  • What are the implications of making incorrect predictions using machine learning, from the perspectives of stakeholders at multiple levels? Examples of the stakeholder levels include: individuals (most important, arguably), educators and supervisors, institutions and organizations, entire industries or fields.

  • There are many other important considerations, beyond the few above.

9.5 Assignment

This week’s assignment is a bit different from other weeks. You will prepare and deliver a presentation in which you a) continue to develop your final project in the class, b) explore the use of classification algorithms, and c) confront ethical concerns related to machine learning.

9.5.1 Short presentation

The only item you have to submit this week is a short presentation, which is described below. This presentation will have six slides37 and be six minutes long. This is just a warm-up to help you further develop your project. Is is not a full presentation.

Reminders:

  • Later in the course (not this week), we will do peer reviews of the short presentations. Please keep this in mind when you prepare your presentation, that a classmate will be reviewing your work. Others in the class might also watch your presentation.

  • This presentation assignment is a continuation of the discussion post you did in Week 7, where you were asked to propose ideas for your final project in this course. Please see any feedback from instructors on your Week 7 discussion post before you prepare your presentation this week.

Task 1: Download the PowerPoint file called “Week 9 brief presentation template” from D2L and rename it to include your name in the file name. This template file already contains instructions and placeholders for you to follow and fill in. You can feel free to modify the provided format however you would like, as long as you accomplish the goal of each slide.

Task 2: There are six substantive slides in the presentation template. Please fill these out according to the topics and guidance on each slide.

Task 3: Record yourself presenting the slides. Suggestions for how to do this are within the presentation template file itself. You should present each slide for one minute or less.

Please upload your finished slides in the appropriate dropbox in D2L. If your audio is separate from your slides, you can also upload your audio to this dropbox or send it separately if that is easier.

9.5.2 Schedule oral exam

This section of the assignment relates to the oral exam that you are required to do in this course.

Task 4: If you have not done so already, review the oral exam details in the introductory chapter of this course book. Since the oral exam contains material from the first ten weeks of the course, you can do your oral exam any time during or after Week 11.

Here are some key reminders about the oral exam:

  • Your oral exam will be conducted in a one-on-one meeting on Zoom between you and an instructor.
  • It is your responsibility to schedule your oral exam by emailing all instructors some times at which you are available for 60–90 minutes.
  • We expect you to prepare for the exam by answering the example questions in the introductory chapter on your own, prior to your scheduled exam.

Task 5: Please e-mail all course instructors with one or more times during Week 11 or later when you could do your oral exam. Remember that it is your responsibility to schedule your exam by emailing us. We will not contact you to schedule your exam. While we recommend that you do your oral exam during Week 11 or 12, you are allowed to do your oral exam at any time before the end of the course.

You have reached the end of this week’s assignment. Please be sure that you have uploaded your presentation materials to D2L and emailed the instructors to schedule your oral exam.


  1. You can include more than six if you wish.↩︎