4.1 Two-way frequency tables

We start by asking if birthweight (outcome) is associated with hypertension in the mother (exposure). To do this, we will want to group birthweight into two categories. Categorisation is a choice made by the researcher - you should be aware that you will lose information about this variable by making the simplifying assumption to categorise. However, in this case, there is a clinical definition of low birthweight, being <2500g, that may have real world relevance. We use dplyr commands from the tidyverse to generate this new variable, specifically mutate().

#--- Generate low birth weight variable
bab9 <- bab9 %>% mutate(lbw = ifelse(bweight < 2500, "low", "normal"))
## Warning: package 'bindrcpp' was built under R version 3.5.3

Let’s briefly unpack this code. We pipe bab9 into the mutate() command to indicate we are working with the bab9 dataset. The mutate() command says we will create a new variable. We say that this new variable will be called lbw (low birthweight) and use an ifelse() statement to generate it. The ifelse statement says that if birthweight is less than 2500g, then lbw should take the value “low”, else it should take the value “normal”. Note that we do not need to generate variables and then label them as we do in Stata - R automatically detects that you are creating a factor (categorical) variable with two levels. Finally, we overwrite the original bab9 datafile with our new bab9 datafile that includes the created lbw variable. Note what has changed in the Environment pane.

The 2x2 frequency table and odds ratio calculations (including a chi-squared test and a Fisher’s exact test for the null hypothesis that there is no association between the two variables) can be generated by the command cc() in the epiDisplay package. We add the option graph = F to suppress the default behaviour of the command to display one.

The epiDisplay command tabpct() also provides both row and column percentages for 2x2 tables. Note that the outcome variable has been put first, with the exposure variable second.

#--- 2x2 table with OR
bab9 %$% cc(ht, lbw, graph = F)
## 
##        lbw
## ht      low normal Total
##   no     53    499   552
##   yes    27     62    89
##   Total  80    561   641
## 
## OR =  0.24 
## 95% CI =  0.14, 0.42  
## Chi-squared = 30.17, 1 d.f., P value = 0
## Fisher's exact test (2-sided) P value = 0
#--- Reminder of same code, no pipe
# cc(bab9$ht, bab9$lbw, graph = F)

#--- 2x2 table with OR and percentages
bab9 %$% tabpct(ht, lbw, graph = F)
## 
## Original table 
##        lbw
## ht       low  normal  Total
##   no      53     499    552
##   yes     27      62     89
##   Total   80     561    641
## 
## Row percent 
##      lbw
## ht        low  normal  Total
##   no       53     499    552
##         (9.6)  (90.4)  (100)
##   yes      27      62     89
##        (30.3)  (69.7)  (100)
## 
## Column percent 
##        lbw
## ht       low       %  normal       %
##   no      53  (66.2)     499  (88.9)
##   yes     27  (33.8)      62  (11.1)
##   Total   80   (100)     561   (100)

Exercise 10.1: From the cc() table, how many low birthweight babies were born to women with hypertension? How many low birthweight babies were born to women with no hypertension?

Exercise 10.2: From the tabpct() table, what proportion of babies born to women who were hypertensive were of low birthweight? How does this compare with the proportion of low birthweight babies born to women who were not hypertensive?

Exercise 10.3: From the cc() table, what is your interpretation of the chi-squared test?