## 4.1 Two-way frequency tables

We start by asking if birthweight (outcome) is associated with hypertension in the mother (exposure). To do this, we will want to group birthweight into two categories. Categorisation is a choice made by the researcher - you should be aware that you will lose information about this variable by making the simplifying assumption to categorise. However, in this case, there is a clinical definition of low birthweight, being <2500g, that may have real world relevance. We use **dplyr** commands from the **tidyverse** to generate this new variable, specifically mutate().

```
#--- Generate low birth weight variable
bab9 <- bab9 %>% mutate(lbw = ifelse(bweight < 2500, "low", "normal"))
```

`## Warning: package 'bindrcpp' was built under R version 3.5.3`

Let’s briefly unpack this code. We pipe bab9 into the mutate() command to indicate we are working with the bab9 dataset. The mutate() command says we will create a new variable. We say that this new variable will be called lbw (low birthweight) and use an ifelse() statement to generate it. The ifelse statement says that *if* birthweight is less than 2500g, then lbw should take the value “low”, *else* it should take the value “normal”. Note that we do not need to generate variables and then label them as we do in Stata - R automatically detects that you are creating a factor (categorical) variable with two levels. Finally, we overwrite the original bab9 datafile with our new bab9 datafile that includes the created lbw variable. Note what has changed in the Environment pane.

The 2x2 frequency table and odds ratio calculations (including a chi-squared test and a Fisher’s exact test for the null hypothesis that there is no association between the two variables) can be generated by the command cc() in the **epiDisplay** package. We add the option graph = F to suppress the default behaviour of the command to display one.

The **epiDisplay** command tabpct() also provides both row and column percentages for 2x2 tables. Note that the outcome variable has been put first, with the exposure variable second.

```
#--- 2x2 table with OR
bab9 %$% cc(ht, lbw, graph = F)
```

```
##
## lbw
## ht low normal Total
## no 53 499 552
## yes 27 62 89
## Total 80 561 641
##
## OR = 0.24
## 95% CI = 0.14, 0.42
## Chi-squared = 30.17, 1 d.f., P value = 0
## Fisher's exact test (2-sided) P value = 0
```

```
#--- Reminder of same code, no pipe
# cc(bab9$ht, bab9$lbw, graph = F)
#--- 2x2 table with OR and percentages
bab9 %$% tabpct(ht, lbw, graph = F)
```

```
##
## Original table
## lbw
## ht low normal Total
## no 53 499 552
## yes 27 62 89
## Total 80 561 641
##
## Row percent
## lbw
## ht low normal Total
## no 53 499 552
## (9.6) (90.4) (100)
## yes 27 62 89
## (30.3) (69.7) (100)
##
## Column percent
## lbw
## ht low % normal %
## no 53 (66.2) 499 (88.9)
## yes 27 (33.8) 62 (11.1)
## Total 80 (100) 561 (100)
```

Exercise 10.1: From the cc() table, how many low birthweight babies were born to women with hypertension? How many low birthweight babies were born to women with no hypertension?

Exercise 10.2: From the tabpct() table, what proportion of babies born to women who were hypertensive were of low birthweight? How does this compare with the proportion of low birthweight babies born to women who were not hypertensive?

Exercise 10.3: From the cc() table, what is your interpretation of the chi-squared test?