3.5 Confounding & Mediation

Returning to SEP and education, draw a causal diagram for how you think the SEP asset index, education, and HIV status are related.

Which way did you draw the arrow between education and SEP? What is the key consideration of the population under study that leads you to draw the arrow this way?

Is SEP asset index a confounder or a mediator between education and HIV status? Is education a confounder or a mediator between SEP asset index and HIV status?

Which of the below analyses is the correct one to investigate the association between SEP asset index and HIV status?

#--- Analysis unadjusted for education: OR for High SEP vs Low SEP is 0.40 (0.17, 0.98), p = 0.046
glm(serostat ~ pc1.q, family = binomial, data = tz) %>% logistic.display() 
## 
## Logistic regression predicting serostat : hiv  positive vs hiv negative 
##  
##                     OR(95%CI)         P(Wald's test) P(LR-test)
## pc1.q: ref.=Low SEP                                  0.156     
##    Low-Middle SEP   0.88 (0.44,1.78)  0.724                    
##    Middle SEP       0.64 (0.3,1.38)   0.256                    
##    High-Middle SEP  0.46 (0.2,1.08)   0.076                    
##    High SEP         0.4 (0.17,0.98)   0.046                    
##                                                                
## Log-likelihood = -277.6879
## No. of observations = 2741
## AIC value = 565.3759
#--- Analysis adjusted for education: OR for High SEP vs Low SEP is 0.70 (0.28, 1.78), p = 0.455
glm(serostat ~ pc1.q + educat, family = binomial, data = tz) %>% logistic.display() 
## 
## Logistic regression predicting serostat : hiv  positive vs hiv negative 
##  
##                                      crude OR(95%CI)   adj. OR(95%CI)   
## pc1.q: ref.=Low SEP                                                     
##    Low-Middle SEP                    0.88 (0.44,1.78)  0.92 (0.45,1.87) 
##    Middle SEP                        0.64 (0.3,1.38)   0.73 (0.34,1.58) 
##    High-Middle SEP                   0.46 (0.2,1.08)   0.59 (0.25,1.41) 
##    High SEP                          0.4 (0.17,0.98)   0.7 (0.28,1.78)  
##                                                                         
## educat: ref.=no education, preschool                                    
##    primary                           0.56 (0.32,0.99)  0.61 (0.34,1.1)  
##    secondary                         0.1 (0.02,0.45)   0.12 (0.03,0.57) 
##                                                                         
##                                      P(Wald's test) P(LR-test)
## pc1.q: ref.=Low SEP                                 0.754     
##    Low-Middle SEP                    0.822                    
##    Middle SEP                        0.426                    
##    High-Middle SEP                   0.237                    
##    High SEP                          0.455                    
##                                                               
## educat: ref.=no education, preschool                0.004     
##    primary                           0.098                    
##    secondary                         0.007                    
##                                                               
## Log-likelihood = -272.2064
## No. of observations = 2741
## AIC value = 558.4129

You should have concluded that the unadjusted analysis was correct. Education mediates the relationship between SEP and HIV. You should have drawn the arrow from SEP to Education - given that our population is adolescent girls and young women and the study is cross sectional, it is more likely that the SEP of the household influences education than vice versa. If we were to take a life-course approach and had more years of data, we could account for cyclical effects between education and SEP.

Note that if you were using conventional mesaures of “statistical significance” (which hopefully you never would…), you would have arisen at different conclusions depending on whether you adjusted or not - always draw the causal diagram!

Now let us examine some more mediation in this dataset. Let us hypothesise that education impacts sexual behaviour (as measured by number of partners), and that in turn, sexual behaviour impacts HIV risk. Let’s also assume that age influences education, sexual behaviour, and HIV risk. Quickly sketch the causal diagram corresponding to this statement.

We will conduct a Baron & Kenny style assessment for mediation for this hypothesis as we did in the example above. We will want to look at sexual behaviour as a mediator and age as a confounder. Create a table to display the odds ratios for primary vs. no education and for secondary vs. no education in the unadjusted, adjusted, and adjusted + mediation scenarios.

#--- Analysis without adjusting: 0.57, 0.10
glm(serostat ~ educat, family = binomial, data = tz) %>% logistic.display() 
## 
## Logistic regression predicting serostat : hiv  positive vs hiv negative 
##  
##                                      OR(95%CI)         P(Wald's test)
## educat: ref.=no education, preschool                                 
##    primary                           0.57 (0.32,1.01)  0.054         
##    secondary                         0.1 (0.02,0.44)   0.002         
##                                                                      
##                                      P(LR-test)
## educat: ref.=no education, preschool < 0.001   
##    primary                                     
##    secondary                                   
##                                                
## Log-likelihood = -277.3553
## No. of observations = 2762
## AIC value = 560.7106
#--- Analysis adjusting for age group: 0.77, 0.13
glm(serostat ~ educat + age.group, family = binomial, data = tz) %>% logistic.display() 
## 
## Logistic regression predicting serostat : hiv  positive vs hiv negative 
##  
##                                      crude OR(95%CI)    adj. OR(95%CI)    
## educat: ref.=no education, preschool                                      
##    primary                           0.57 (0.32,1.01)   0.77 (0.43,1.37)  
##    secondary                         0.1 (0.02,0.44)    0.13 (0.03,0.56)  
##                                                                           
## age.group: 20-24 vs 14-19            6.78 (3.42,13.44)  6.56 (3.29,13.08) 
##                                                                           
##                                      P(Wald's test) P(LR-test)
## educat: ref.=no education, preschool                0.001     
##    primary                           0.38                     
##    secondary                         0.006                    
##                                                               
## age.group: 20-24 vs 14-19            < 0.001        < 0.001   
##                                                               
## Log-likelihood = -258.0038
## No. of observations = 2762
## AIC value = 524.0077
#--- Analysis adjusting for age group + potential mediation from number of partners: 0.85, 0.30
glm(serostat ~ educat + age.group + partners.cat, family = binomial, data = tz) %>% logistic.display() 
## 
## Logistic regression predicting serostat : hiv  positive vs hiv negative 
##  
##                                      crude OR(95%CI)     
## educat: ref.=no education, preschool                     
##    primary                           0.57 (0.32,1.01)    
##    secondary                         0.1 (0.02,0.44)     
##                                                          
## age.group: 20-24 vs 14-19            6.78 (3.42,13.44)   
##                                                          
## partners.cat: ref.=0                                     
##    1                                 12.51 (2.9,54.06)   
##    2+                                36.82 (8.86,152.99) 
##                                                          
##                                      adj. OR(95%CI)      P(Wald's test)
## educat: ref.=no education, preschool                                   
##    primary                           0.85 (0.48,1.52)    0.592         
##    secondary                         0.3 (0.07,1.33)     0.113         
##                                                                        
## age.group: 20-24 vs 14-19            3.08 (1.5,6.32)     0.002         
##                                                                        
## partners.cat: ref.=0                                                   
##    1                                 5.71 (1.23,26.43)   0.026         
##    2+                                15.14 (3.35,68.37)  < 0.001       
##                                                                        
##                                      P(LR-test)
## educat: ref.=no education, preschool 0.187     
##    primary                                     
##    secondary                                   
##                                                
## age.group: 20-24 vs 14-19            < 0.001   
##                                                
## partners.cat: ref.=0                 < 0.001   
##    1                                           
##    2+                                          
##                                                
## Log-likelihood = -244.6309
## No. of observations = 2760
## AIC value = 501.2619

Is there a large difference in the OR when you add in the potential mediating variable of number of partners? Would you choose to further investigate number of partners as a mediator?

What do you think the overarching causal diagram looks like for this question? Can you combine the two diagrams you have drawn into one? Think about how you might ascertain which variables you need to control for (i.e. which are confounders). This will be addressed in the DAGs lecture.