## 4.4 Further exercises

We read in the bab (original) dataset, replicating the command given in the first chunk and create a new variable for birthweight divided into those less than 2500g, those between 2500g and 3000g, and those over 3000g. We do this with the **tidyverse** function *case_when()*. This function uses the formula syntax we have seen before: we explicitly set out the condition on the left hand side of the tilde and the value we want on the right (e.g. the first line of case_when says that when birthweight is less than or equal to 2500, set the value of lbw3 to 1). To use multiple conditions, we use the & operator to stand for ‘AND’. We can also use | to stand for ‘OR’ and ! to stand for ‘NOT’. Note that after applying *case_when()* we then convert this variable to a factor.

We can also use a nested ifelse approach - if the bweight variable is less than 2500, give it the value 1, else continue to the next statement - we repeat this process. In general, *case_when()* is more clear than using *ifelse()* as it explicitly sets out each new level and generates an NA when a variable does not meet any conditions, while *ifelse()* can lead to every value not specified being lumped into the final *else* condition. In this case, we have only three possible categories and thus the ifelse approach is relatively straightforward, but nesting many statements can get rapidly confusing.

Exercise 10.7: What percentage of birthweights were 3000g or more?

```
#--- Read in the bab file
bab <- read.dta("./BAB.dta", convert.factors = T)
#--- Create the low birthweight categorical variable
bab <- bab %>% mutate(lbw3 = case_when(bweight <= 2500 ~ 1,
bweight < 3000 & bweight > 2500 ~ 2,
bweight >= 3000 ~ 3)) %>%
mutate(lbw3 = as.factor(lbw3))
summary(bab$lbw3)
```

```
## 1 2 3
## 82 136 423
```

```
#--- Same as above using a nested ifelse approach
# bab <- bab %>% mutate(lbw3 = ifelse(bweight <= 2500, 1, ifelse(bweight < 3000, 2, 3))) %>%
# mutate(lbw3 = as.factor(lbw3))
```

Now we conduct some further analyses.

Exercise 10.8: Analyse whether there is any association between sex of infant and birthweight (using lbw3). How many degrees of freedom is the chi-square test based on?

```
#--- Get a 2x2 table with a chi-square test
bab %$% cc(sex, lbw3, graph = F)
```

```
##
## lbw3
## sex 1 2 3
## male 36 63 227
## female 46 73 196
##
## Odds ratio 1 0.91 0.68
## lower 95% CI 0.5 0.41
## upper 95% CI 1.63 1.12
##
## Chi-squared = 4.04 , 2 d.f., P value = 0.133
## Fisher's exact test (2-sided) P value = 0.131
```

Now we categorise maternal age into four groups (<30, 30-34, 35-39, 40+) to investigate the relationship with hypertension. We could do this by using three nested ifelse statements, or by using *case_when()* but we demonstrate a straightforward way to categorise the continuous variable using the *cut()* function.

Exercise 10.9: Is there any evidence of a linear association between hypertension and maternal age?

```
#--- Create the new variable and extract a summary
bab <- bab %>% mutate(matagegp = cut(matage,
breaks = c(0, 29, 34, 39,+Inf),
labels = c("<30", "30-34", "35-39", "40+")
))
summary(bab$matagegp)
```

```
## <30 30-34 35-39 40+
## 92 251 258 40
```

```
#--- Investigate the relationship between maternal age group and hypertension
bab %$% cc(ht, matagegp, graph = F)
```

```
##
## matagegp
## ht <30 30-34 35-39 40+
## no 76 211 232 33
## yes 16 40 26 7
##
## Odds ratio 1 0.9 0.53 1.01
## lower 95% CI 0.46 0.26 0.32
## upper 95% CI 1.83 1.12 2.89
##
## Chi-squared = 5.39 , 3 d.f., P value = 0.145
## Fisher's exact test (2-sided) P value = 0.121
```

We create a new variable that codes which quarter of gestation age each baby falls into. This uses exactly the same code as presented above and the same ntile() function.

Exercise 10.10: Find the values of gestational age which define the quartiles.

Exercise 10.11: What proportion of infants in the lowest quarter of gestational age were males?

```
#--- Get the quintiles
bab$gest4 <- as.factor(ntile(bab$gestwks, 4))
summary(bab$gest4)
```

```
## 1 2 3 4
## 161 160 160 160
```

```
#--- Extract the top of each quintile (and thus the cut points)
bab %>% group_by(gest4) %>% dplyr::summarise(max(gestwks))
```

```
## # A tibble: 4 x 2
## gest4 `max(gestwks)`
## <fct> <dbl>
## 1 1 38.0
## 2 2 39.2
## 3 3 40.2
## 4 4 42.3
```

```
#--- Get the 2x2 table of gender and gestational quartile
bab %$% tabpct(sex, gest4, graph = F)
```

```
##
## Original table
## gest4
## sex 1 2 3 4 Total
## male 80 85 86 75 326
## female 81 75 74 85 315
## Total 161 160 160 160 641
##
## Row percent
## gest4
## sex 1 2 3 4 Total
## male 80 85 86 75 326
## (24.5) (26.1) (26.4) (23) (100)
## female 81 75 74 85 315
## (25.7) (23.8) (23.5) (27) (100)
##
## Column percent
## gest4
## sex 1 % 2 % 3 % 4 %
## male 80 (49.7) 85 (53.1) 86 (53.8) 75 (46.9)
## female 81 (50.3) 75 (46.9) 74 (46.2) 85 (53.1)
## Total 161 (100) 160 (100) 160 (100) 160 (100)
```