25 Lab 8 - Hypothesis Testing

25.0.1 Introduction

In the tutorial we explored the influence, or lack thereof, that weekend alcohol consumption and extra paid off campus classes had on the final math grades of two classes of Portuguese high school students in the 2005-2006 school year.

In this lab, we will consider one more factor: romantic relationships. Information about the relationship status of students in this dataset is stored in the variable `romantic`. This is a factor variable with two levels, `yes` and `no`, which indicate that a given student was or was not in a romantic relationship at the time this data was collected, respectively.

This is worth considering because it is widely believed, especially in China, that romantic relationships are detrimental to the academic success of high school students. Traditionally, Chinese high school students are strongly discouraged from developing romantic relationships with their classmates. Teachers and parents have been known to intervene when they suspect that two students may be acting on their feelings for each other in order to split young couples up. This is not something that is done in our program, of course, and I have not heard anything about this happening to regular track students either, but when I worked in Harbin what I just described was the norm.

Of course Portugal is not China and these two countries have very different education systems and cultures. Still, it is worth seeing if we can or cannot find some empirical evidence to support this belief. It is reasonable to believe that a romantic relationship would in fact lead to worse academic performance for just about anybody because it drains time and energy away from one’s studies. That is an effect that, if it exists, likely transcends cultures.

1. Load the ‘student_mat.csv’ file and filter out the data for which final math scores are missing like in the tutorial. Save the result as `math_filtered`.
2. Draw a bar plot of `romantic` and label it.
3. Split `math_filtered` into two smaller datasets. The first should be called `single` and it should contain all rows from `math_filtered` for which `romantic == 'no'`. The second should be called `not_single` and it should contain all rows from `math_filtered` for which `romantic == 'yes'`. Use this data to plot side-by-side histograms of the distributions of final math scores for these two groups. When you are done, your plot should look like the one below. Make sure you include the correct sample sizes in the subtitles of each histogram.