Chapter 4 MIDTERM – EXERCISE & SLEEP

This exam is testing my ability to showcase what I have learned so far during the first half of the semester.

4.1 SETUP & DATA IMPORT

library(readxl)
participant_info_midterm <- read_excel("data/midterm_sleep_exercise.xlsx", sheet = 1)
head(participant_info_midterm)

## # A tibble: 6 × 4
##   ID    Exercise_Group Sex      Age
##   <chr> <chr>          <chr>  <dbl>
## 1 P001  NONE           Male      35
## 2 P002  Nonee          Malee     57
## 3 P003  None           Female    26
## 4 P004  None           Female    29
## 5 P005  None           Male      33
## 6 P006  None           Female    33

library(readxl)
sleep_data_midterm <- read_excel("data/midterm_sleep_exercise.xlsx", sheet = 2)
head(sleep_data_midterm)

## # A tibble: 6 × 4
##   ID    Pre_Sleep Post_Sleep Sleep_Efficiency
##   <chr> <chr>          <dbl>            <dbl>
## 1 P001  zzz-5.8          4.7             81.6
## 2 P002  Sleep-6.6        7.4             75.7
## 3 P003  <NA>             6.2             82.9
## 4 P004  SLEEP-7.2        7.3             83.6
## 5 P005  score-7.4        7.4             83.5
## 6 P006  Sleep-6.6        7.1             88.5

head(sleep_data_midterm)

## # A tibble: 6 × 4
##   ID    Pre_Sleep Post_Sleep Sleep_Efficiency
##   <chr> <chr>          <dbl>            <dbl>
## 1 P001  zzz-5.8          4.7             81.6
## 2 P002  Sleep-6.6        7.4             75.7
## 3 P003  <NA>             6.2             82.9
## 4 P004  SLEEP-7.2        7.3             83.6
## 5 P005  score-7.4        7.4             83.5
## 6 P006  Sleep-6.6        7.1             88.5

4.2 MERGE & BASE CLEANING

library(tidyverse)
participant_info_midterm <- participant_info_midterm %>%
  mutate(Exercise_Group = recode(Exercise_Group,
                                  "NONE" = "None",
                                  "Nonee" = "None",
                                  "N" = "None",
                                  "C" = "Cardio",
                                  "weights" = "Weights",
                                  "WEIGHTZ" = "Weights",
                                  "WEIGHTS" = "weights",
                                  "WEIGHTSSS" = "Weights",
                                  "CW" = "Cardio+Weights",
                                  "C+W" = "Cardio+Weights",
                                  ))
head(participant_info_midterm)

## # A tibble: 6 × 4
##   ID    Exercise_Group Sex      Age
##   <chr> <chr>          <chr>  <dbl>
## 1 P001  None           Male      35
## 2 P002  None           Malee     57
## 3 P003  None           Female    26
## 4 P004  None           Female    29
## 5 P005  None           Male      33
## 6 P006  None           Female    33

library(tidyverse)
participant_info_midterm <- participant_info_midterm %>%
  mutate(Sex = recode(Sex,
                      "Malee" = "Male",
                      "Femalee" = "Female",
                      "F" = "Female",
                      "M" = "Male",
                      "Fem" = "Female",
                      "MALE" = "Male",
                      "Mal" = "Male",
                      ))
head(participant_info_midterm)

## # A tibble: 6 × 4
##   ID    Exercise_Group Sex      Age
##   <chr> <chr>          <chr>  <dbl>
## 1 P001  None           Male      35
## 2 P002  None           Male      57
## 3 P003  None           Female    26
## 4 P004  None           Female    29
## 5 P005  None           Male      33
## 6 P006  None           Female    33

participant_sleep_merge <- merge(participant_info_midterm, sleep_data_midterm, by="ID")
head(participant_sleep_merge)

##     ID Exercise_Group    Sex Age Pre_Sleep Post_Sleep Sleep_Efficiency
## 1 P001           None   Male  35   zzz-5.8        4.7             81.6
## 2 P002           None   Male  57 Sleep-6.6        7.4             75.7
## 3 P003           None Female  26      <NA>        6.2             82.9
## 4 P004           None Female  29 SLEEP-7.2        7.3             83.6
## 5 P005           None   Male  33 score-7.4        7.4             83.5
## 6 P006           None Female  33 Sleep-6.6        7.1             88.5

4.3 CREATE DERIVED VARIABLES

participant_sleep_merge <- participant_sleep_merge %>%
  separate(Pre_Sleep, into = c("Text", "Pre_Sleep_New"), sep = "(?<=\\D)(?=\\d)", remove = FALSE)

## Warning: Expected 2 pieces. Additional pieces discarded in 77 rows [1, 2, 4, 5, 6,
## 8, 9, 10, 12, 15, 16, 17, 18, 20, 21, 22, 25, 26, 27, 28, ...].

participant_sleep_merge$Pre_Sleep_New <- as.numeric(participant_sleep_merge$Pre_Sleep_New)
participant_sleep_merge <- participant_sleep_merge %>% dplyr::select(-Pre_Sleep, -Text)
head(participant_sleep_merge)

##     ID Exercise_Group    Sex Age Pre_Sleep_New Post_Sleep Sleep_Efficiency
## 1 P001           None   Male  35             5        4.7             81.6
## 2 P002           None   Male  57             6        7.4             75.7
## 3 P003           None Female  26            NA        6.2             82.9
## 4 P004           None Female  29             7        7.3             83.6
## 5 P005           None   Male  33             7        7.4             83.5
## 6 P006           None Female  33             6        7.1             88.5

participant_sleep_merge <- participant_sleep_merge %>%
  mutate(Sleep_Difference = Post_Sleep - Pre_Sleep_New)
head(participant_sleep_merge)

##     ID Exercise_Group    Sex Age Pre_Sleep_New Post_Sleep Sleep_Efficiency
## 1 P001           None   Male  35             5        4.7             81.6
## 2 P002           None   Male  57             6        7.4             75.7
## 3 P003           None Female  26            NA        6.2             82.9
## 4 P004           None Female  29             7        7.3             83.6
## 5 P005           None   Male  33             7        7.4             83.5
## 6 P006           None Female  33             6        7.1             88.5
##   Sleep_Difference
## 1             -0.3
## 2              1.4
## 3               NA
## 4              0.3
## 5              0.4
## 6              1.1

participant_sleep_merge <- participant_sleep_merge %>% 
  mutate(AgeGroup2 = case_when(
  Age < 40 ~ "<40",
  Age >= 40 ~ ">=40",
  TRUE ~ NA_character_
))
head(participant_sleep_merge)

##     ID Exercise_Group    Sex Age Pre_Sleep_New Post_Sleep Sleep_Efficiency
## 1 P001           None   Male  35             5        4.7             81.6
## 2 P002           None   Male  57             6        7.4             75.7
## 3 P003           None Female  26            NA        6.2             82.9
## 4 P004           None Female  29             7        7.3             83.6
## 5 P005           None   Male  33             7        7.4             83.5
## 6 P006           None Female  33             6        7.1             88.5
##   Sleep_Difference AgeGroup2
## 1             -0.3       <40
## 2              1.4      >=40
## 3               NA       <40
## 4              0.3       <40
## 5              0.4       <40
## 6              1.1       <40

participant_sleep_merge <- participant_sleep_merge %>%
  filter(!is.na(Sleep_Difference))
head(participant_sleep_merge)

##     ID Exercise_Group    Sex Age Pre_Sleep_New Post_Sleep Sleep_Efficiency
## 1 P001           None   Male  35             5        4.7             81.6
## 2 P002           None   Male  57             6        7.4             75.7
## 3 P004           None Female  29             7        7.3             83.6
## 4 P005           None   Male  33             7        7.4             83.5
## 5 P006           None Female  33             6        7.1             88.5
## 6 P007           None   Male  32             6        6.7             83.6
##   Sleep_Difference AgeGroup2
## 1             -0.3       <40
## 2              1.4      >=40
## 3              0.3       <40
## 4              0.4       <40
## 5              1.1       <40
## 6              0.7       <40

4.4 DESCRIPTIVE STATISTICS

library(dplyr)
stat_1 <- participant_sleep_merge %>%
  dplyr::summarize(
    mean_sleep_diff = mean(Sleep_Difference, na.rm = TRUE),
    sd_sleep_diff = sd(Sleep_Difference, na.rm = TRUE),
    min_sleep_diff = min(Sleep_Difference, na.rm = TRUE),
    max_sleep_diff = max(Sleep_Difference, na.rm = TRUE),
    mean_sleep_eff = mean(Sleep_Efficiency, na.rm = TRUE),
    sd_sleep_eff = sd(Sleep_Efficiency, na.rm = TRUE),
    min_sleep_eff = min(Sleep_Efficiency, na.rm = TRUE),
    max_sleep_eff = max(Sleep_Efficiency, na.rm = TRUE)
  )
print(stat_1)

##   mean_sleep_diff sd_sleep_diff min_sleep_diff max_sleep_diff
## 1        1.115116     0.7411466           -0.8            2.4
##   mean_sleep_eff sd_sleep_eff min_sleep_eff max_sleep_eff
## 1       83.77558     5.973804          71.7         101.5

4.4.1 Reporting statistics

library(dplyr)
stat_2 <- participant_sleep_merge %>%
  group_by(Exercise_Group) %>%
  dplyr::summarize(
    mean_sleep_diff = mean(Sleep_Difference, na.rm = TRUE),
    sd_sleep_diff = sd(Sleep_Difference, na.rm = TRUE),
    min_sleep_diff = min(Sleep_Difference, na.rm = TRUE),
    max_sleep_diff = max(Sleep_Difference, na.rm = TRUE),
    mean_sleep_eff = mean(Sleep_Efficiency, na.rm = TRUE),
    sd_sleep_eff = sd(Sleep_Efficiency, na.rm = TRUE),
    min_sleep_eff = min(Sleep_Efficiency, na.rm = TRUE),
    max_sleep_eff = max(Sleep_Efficiency, na.rm = TRUE)
  )
print(stat_2)

## # A tibble: 5 × 9
##   Exercise_Group mean_sleep_diff sd_sleep_diff min_sleep_diff
##   <chr>                    <dbl>         <dbl>          <dbl>
## 1 Cardio                   1.63          0.563         0.300 
## 2 Cardio+Weights           1.33          0.538         0.400 
## 3 None                     0.448         0.739        -0.8   
## 4 Weights                  0.55          0.354         0.300 
## 5 weights                  1.08          0.593         0.1000
## # ℹ 5 more variables: max_sleep_diff <dbl>, mean_sleep_eff <dbl>,
## #   sd_sleep_eff <dbl>, min_sleep_eff <dbl>, max_sleep_eff <dbl>

4.5 VISUALIZATIONS (3 plots)

ggplot(participant_sleep_merge, aes(x = Exercise_Group, y = Sleep_Difference)) +
  geom_boxplot(aes(fill = Exercise_Group), color = "black") +
  labs(title = "Boxplot of Sleep_Difference by Exercise Group",
       x = "Exercise Group",
       y = "Sleep Difference") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Figure 4.1: This graph shows sleep difference when categorized by exercise group.

ggplot(participant_sleep_merge, aes(x = Exercise_Group, y = Sleep_Efficiency)) +
  geom_boxplot(aes(fill = Exercise_Group), color = "black") +
  labs(title = "Boxplot of Sleep_Efficiency by Exercise Group",
       x = "Exercise Group",
       y = "Sleep Efficiency") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Figure 4.2: This graph shows sleep efficiency when categorized by exercise group.

ggplot(participant_sleep_merge, aes(x = Sleep_Efficiency, y = Sleep_Difference)) +
  geom_point(aes(color = Exercise_Group), alpha = 0.7) +
  geom_smooth(method = "lm", color = "blue", se = FALSE) +
  labs(title = "Scatterplot of Sleep_Difference vs Sleep_Efficiency with Trend Line",
       x = "Sleep Efficiency",
       y = "Sleep Difference") +
  theme_minimal() +
  theme(legend.position = "bottom")

## `geom_smooth()` using formula = 'y ~ x'

Figure 4.3: This graph shows sleep efficiency and sleep difference when merging participants.

4.6 T-TESTS (TWO)

4.6.1 T-test 1: Sleep_Difference ~ Sex

t_test_sex <- t.test(Sleep_Difference ~ Sex, data = participant_sleep_merge)
t_test_sex

## 
##  Welch Two Sample t-test
## 
## data:  Sleep_Difference by Sex
## t = 1.4956, df = 82.199, p-value = 0.1386
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -0.0776382  0.5481291
## sample estimates:
## mean in group Female   mean in group Male 
##            1.2163265            0.9810811

There is no significant difference between males (mean = 1.00) and females (mean = 1.22) during sleep when the p-value is 0.14.

4.6.2 T-test 2: Sleep_Difference ~ AgeGroup2

t_test_agegroup <- t.test(Sleep_Difference ~ AgeGroup2, data = participant_sleep_merge)
t_test_agegroup

## 
##  Welch Two Sample t-test
## 
## data:  Sleep_Difference by AgeGroup2
## t = -1.6961, df = 33.044, p-value = 0.09927
## alternative hypothesis: true difference in means between group <40 and group >=40 is not equal to 0
## 95 percent confidence interval:
##  -0.65568817  0.05945879
## sample estimates:
##  mean in group <40 mean in group >=40 
##           1.049254           1.347368

There is no significant difference between peope under 40 (mean = 1.05) and people equal to or over 40 (mean = 1.35) during sleep when the p-value is 0.10.

4.7 ANOVAS (TWO) + POST-HOCS

anova_A <- aov(Sleep_Difference ~ Exercise_Group, data = participant_sleep_merge)
summary(anova_A)

##                Df Sum Sq Mean Sq F value   Pr(>F)    
## Exercise_Group  4  16.62   4.154   11.19 2.83e-07 ***
## Residuals      81  30.07   0.371                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

library(supernova)
supernova(anova_A)

##  Analysis of Variance Table (Type III SS)
##  Model: Sleep_Difference ~ Exercise_Group
## 
##                              SS df    MS      F   PRE     p
##  ----- --------------- | ------ -- ----- ------ ----- -----
##  Model (error reduced) | 16.616  4 4.154 11.188 .3559 .0000
##  Error (from model)    | 30.074 81 0.371                   
##  ----- --------------- | ------ -- ----- ------ ----- -----
##  Total (empty model)   | 46.690 85 0.549

The F value is 11.188, the p-value is 0.00, with degree of freedom being 85. Based on those results, there is significant difference effect of Exercise_Group on Sleep_Difference.

tukey_result_a <- TukeyHSD(anova_A)
tukey_result_a

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Sleep_Difference ~ Exercise_Group, data = participant_sleep_merge)
## 
## $Exercise_Group
##                              diff         lwr          upr     p adj
## Cardio+Weights-Cardio  -0.2981366 -0.81128047  0.215007179 0.4884868
## None-Cardio            -1.1809524 -1.70562897 -0.656275794 0.0000002
## weights-Cardio         -0.5443609 -1.08266772 -0.006054083 0.0461359
## Weights-Cardio         -1.0785714 -2.33670169  0.179558828 0.1280022
## None-Cardio+Weights    -0.8828157 -1.39595956 -0.369671910 0.0000688
## weights-Cardio+Weights -0.2462243 -0.77329664  0.280848126 0.6898500
## Weights-Cardio+Weights -0.7804348 -2.03379938  0.472929812 0.4172765
## weights-None            0.6365915  0.09828466  1.174898298 0.0122126
## Weights-None            0.1023810 -1.15574930  1.360511209 0.9993984
## Weights-weights        -0.5342105 -1.79808570  0.729664647 0.7630899

It seems like Exercise_Group has a significant influence on Sleep_Difference(Cardio, Cardio+Weights, and Weights) compare to no exercise at all. Although, the Weights and Cardio+Weights groups did not show significant differences when compared to each other, nor with the None group in some instances. Carddio seems to show the least amount of difference from the other groups for Sleep_Difference.

anova_B <- aov(Sleep_Efficiency ~ Exercise_Group, data = participant_sleep_merge)
summary(anova_B)

##                Df Sum Sq Mean Sq F value  Pr(>F)   
## Exercise_Group  4  544.7  136.17   4.432 0.00273 **
## Residuals      81 2488.7   30.72                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

library(supernova)
supernova(anova_B)

##  Analysis of Variance Table (Type III SS)
##  Model: Sleep_Efficiency ~ Exercise_Group
## 
##                                SS df      MS     F   PRE     p
##  ----- --------------- | -------- -- ------- ----- ----- -----
##  Model (error reduced) |  544.688  4 136.172 4.432 .1796 .0027
##  Error (from model)    | 2488.650 81  30.724                  
##  ----- --------------- | -------- -- ------- ----- ----- -----
##  Total (empty model)   | 3033.339 85  35.686

The F value is 4.438, the p-value is 0.00, with degree of freedom being 85. Based on those results, there is significant difference effect of Exercise_Group on Sleep_Difference.

tukey_result_b <- TukeyHSD(anova_B)
tukey_result_b

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Sleep_Efficiency ~ Exercise_Group, data = participant_sleep_merge)
## 
## $Exercise_Group
##                              diff        lwr        upr     p adj
## Cardio+Weights-Cardio   1.3871636  -3.280763  6.0550902 0.9208267
## None-Cardio            -4.3761905  -9.149027  0.3966465 0.0880681
## weights-Cardio         -4.1370927  -9.033920  0.7597347 0.1378924
## Weights-Cardio         -2.5976190 -14.042480  8.8472420 0.9692105
## None-Cardio+Weights    -5.7633540 -10.431281 -1.0954274 0.0078663
## weights-Cardio+Weights -5.5242563 -10.318887 -0.7296254 0.0156720
## Weights-Cardio+Weights -3.9847826 -15.386292  7.4167266 0.8656163
## weights-None            0.2390977  -4.657730  5.1359252 0.9999207
## Weights-None            1.7785714  -9.666290 13.2234325 0.9925097
## Weights-weights         1.5394737  -9.957647 13.0365947 0.9957739

It seems like Exercise_Group or type has a significant influence on Slee_Efficiency because the exercise type can at least explain a portion of the variation in Sleep_Effiency.

4.8 SYNTHESIS & RECOMMENDATION

Looking at Sleep_Difference and Sleep_Efficiency, I think that Cardio+Weights is a good candidate for improving sleep. The results showed that Exercise_Group significantly improved sleep quality in Sleep_Difference compared Sleep_Efficiency (11.188 and p = 0.00 for Sleep_Difference and 4.432 and p = 0.00 for Sleep_Efficiency). That means, Exercise_Group had a significant on sleep overall. The Tukey provides more in depth analysis on how effective different type of exercises are.

4.9 REFLECTION

I found cleaning the columns somewhat difficult. I would create or delete columns that I totally need while the ones that I do not remained. I guess that is a an error on my part. I also found the ANOVA part somewhat challenging. With so many numbers in a table, I could not keep track on what I should pay attention to the most and reporting the F-value for different exercise type was not exactly easy as the table was not really intuitive in letting you know what the numbers really mean.

4.9.1 CHECKLIST

[y] Imported both sheets and merged cleanly on ID
[y] Cleaned Pre_Sleep to numeric and created Sleep_Difference
[y] Created AgeGroup2 (two-level) via case_when
[y] Produced descriptive statistics for both outcomes
[y] Made 3 clear plots with labels/titles
[y] Ran TWO t-tests (Sex, AgeGroup2) on Sleep_Difference
[y] Ran TWO ANOVAs (Exercise_Group) on Sleep_Difference and Sleep_Efficiency
[y] Performed Tukey post-hoc tests and interpreted pairwise differences
[y] Verified that all figures and tables display meaningful, readable labels and legends.
[y] Wrote a concise recommendation supported by stats and visuals
[y] Converted this work into an R Markdown report with narrative + code