5  Causes and Risks

Learning Objectives

  • Overview of the causes and risks
  • Identify the objective of the research question
  • Causal inference

“…fear is the most pervasive emotion of modern society…” -1

What qualifies as a risk is subject to dynamic social change2 , as well as the perception of risk has evolved over time, influenced by factors such as media coverage and sociopolitical dynamics.

Historically, major risks included starvation, infections, and violent conflicts, while modern risks are often associated with lifestyle choices and chronic diseases such as obesity, cardiovascular disease, and cancer. Despite advancements in healthcare and increasing life expectancy in post-industrial countries, the focus often shifts to perceived threats like terrorism, global pandemics such as COVID-19, and environmental catastrophes. This shift is reflected in the increasing combination of quantitative analyses and public health interventions, tracking changes in risk-related discourse and identifying key risk topics over time.

Furthermore, tools like topic modelling and sentiment analysis help identify how the public perceives various risks and how these perceptions evolve3 over time.

In the field of public health, the latest GBD results reveal significant insights into the causes and risks associated with health metrics and infectious diseases\index{infectious diseases. The primary risks identified include behavioral, environmental, occupational, and metabolic factors.

5.1 Conditions and Injuries

Conditions and injuries associated with the burden of disease and injury vary according to specific causes and risks. In this book causes and risk factors include:

  • Lifestyle choices: Poor diet, physical inactivity, tobacco use, and excessive alcohol consumption are major risk factors for many chronic diseases and injuries, including heart disease, stroke, cancer, and liver disease.
  • Environmental factors: Exposure to pollutants, such as air pollution and toxic chemicals, can increase the risk of certain diseases and injuries.
  • Infections: Many diseases, such as tuberculosis, HIV/AIDS, and malaria, are caused by infectious agents.
  • Poverty: People living in poverty are often more susceptible to health problems due to limited access to healthcare, healthy food, and safe living conditions.
  • Aging: As people get older, they are at an increased risk of many health problems, including chronic diseases and disabilities.
  • Genetics: Some diseases and injuries are caused by genetic factors, such as a genetic predisposition to certain cancers.
  • Injuries: Injuries, such as falls, road traffic accidents, and violence, can also contribute to the burden of diseases and injuries.
Causal relationships leading to fueling DALYs value
Figure 5.1: Causal relationships leading to fueling DALYs value

A particular health condition can have multiple causes (co-morbidities) and risk factors. For instance, a poverty status and the lack of access to healthcare facilities, is proven to be increasing the risk of infectious diseases, while poor diet and physical inactivity can increase the risk of chronic diseases. Acting in favor of addressing the underlying causes and risk factors for diseases and injuries is crucial for prompt public health interventions and can help reduce the overall burden of disease.

5.2 Risk Measures

To provide a comprehensive framework for assessing the burden of different risk factors on population health and guide effective public health strategies to mitigate these risks, key measures are used to assess risks and their impact on health outcomes:

  • Risk-specific exposures
  • Relative risks (RRs)
  • Theoretical minimum-risk exposure levels (TMRELs)
  • Population attributable fractions (PAFs)

5.2.1 Risk-specific exposures

The quantification of risks and causes involves the evaluation of a set of behavioral, environmental and occupational, and metabolic risks. Pairs of risk-outcome are investigated based on observations and statistical evidences. Convincing evidences consist of plausible associations between exposure and disease in terms of size, duration and effects.

Risk combinations can be additive (the occurrence of a least one event, A or B), multiplicative (the occurrence of both of two events, A and B) or just interactive, acting to influence other pairs, this action is generally identified as possible confounding, to be distinguished by factors in the causal pathway between exposure and outcome.

To have an idea of the impact of different risk factors on a cause of illness, the Socio-demographic Index (SDI) provide insights into the potential magnitude of social, cultural and demographic factors looking at the risk exposures and possible paths for policy interventions. The Life Expectancy level is closely correlated to the level of the SDI indicator as it is based on average income per person, educational attainment, and total fertility rate (TFR). Higher SDI values typically indicate better socio-economic conditions, including improved access to healthcare, education, and sanitation, which can mitigate various health risks. Conversely, lower SDI values are associated with higher risk exposure due to limited access to healthcare, poorer living conditions, and other socio-economic challenges. An application of the SDI index on time series is on Chapter 9.

One more element to take into consideration is the Comparative Risk Assessment (CRA)4 divided into attributable and avoidable burden. Considering as the objective the potential reduction of future disease burden, four types of minimum risk exposure distributions are identified:

  • theoretical
  • plausible
  • feasible
  • cost-effective

The following provide a high level overview on quantifying attributable burden by using the theoretical minimum risk.

5.2.2 Relative Risks (RRs)

The Relative Risk (or Risk Ratio) for the risk-outcome pairs, mortality and morbidity calculation passes through the decomposition of the attributable burden to a risk. The sum of all-risk exposure effect is accounted for location, age, sex, and mediation-cause. And as said the exposure of all-risks for a disease is split in metabolic, behavioral and environmental risks. Each factor contributes to the overall change.

The relative risk is the ratio between the proportions of exposed and unexposed groups.

RR= \frac{p_0}{p_1} \tag{5.1}

where p_1 and p_0 are the proportions of exposed and unexposed group respectively. Or, in terms of population, these group would approx the values of the real population:

RR = \frac{p_1}{p_0}=\frac{d_1/n_1}{d_0/n_0} \tag{5.2}

where d_1/n_1 and d_0/n_0 are the proportion of the population with and without the disease.

For example, let’s say we are studying the association between smoking (exposure) and lung cancer (outcome). We want to calculate the relative risk of lung cancer among smokers compared to non-smokers. If the relative risk is 2, it means that smokers are twice as likely to develop lung cancer compared to non-smokers.

# Create a 2x2 table of exposure and outcome
exposed <- c(50, 10)
unexposed <- c(20, 5)

# Calculate the relative risk
relative_risk <- function(exposed, unexposed) {
  (exposed[1] / sum(exposed)) / (unexposed[1] / sum(unexposed))
  }

# Print the relative risk
relative_risk(exposed, unexposed)
#> [1] 1.041667

Define the number of events and person-time at risk for exposed and unexposed groups:

d1 <- 50 # Number of events in the exposed group
n1 <- 10 # Person-time at risk in the exposed group
d0 <- 20 # Number of events in the unexposed group
n0 <- 5 # Person-time at risk in the unexposed group

# Calculate the relative risk
relative_risk_d <- (d1 / n1) / (d0 / n0)

# Print the relative risk
relative_risk_d
#> [1] 1.25
  • The first formula, RR=p_1/p_0, calculates the relative risk directly using the proportions of events p_1 in the exposed group compared to the unexposed group p_0 . This formula provides a more simplified view of the relative risk based solely on event proportions.
  • The second formula, RR=\frac{d_1/n_1}{d_0/n_0}, considers both the number of events (d1, d0) and person-time at risk (n1, n0) for each group. This formula takes into account the incidence rate in addition to event proportions, providing a more specific understanding of the relative risk by incorporating information about the duration of exposure.

5.2.3 Relative Risks and Network Analysis

In some cases, relative risks can be modeled using network analysis, a specialized approach within statistical modelling which extend the concept of mixed effects to compare multiple treatments while accounting for various factors and dependencies.

The relationship between variables, represented by nodes and edges considering potential interactions or dependencies between them. This approach is generally favorable when exploring complex relationships among multiple variables.

To represent the network we can use a Directed Acyclic Graph (DAG) for drawing causal relationships between variables, such as the relationship between health risks and diseases.

The following code generates a network graph representing the causal pathways between different variables, such as smoking, high blood pressure, diabetes, obesity, and heart disease. By visualizing the relationships between these variables, we can identify potential treatments or interventions.

Network graph representing the causal pathways between smoking, high blood pressure, diabetes, obesity, and heart disease.
Figure 5.2: DAG of the causal pathways between smoking, high blood pressure, diabetes, obesity, and heart disease.

One more example of a network graph that would be helpful to identify the relationship between outcome (O), exposure (E) and different risk factors is made with the ggdag package and the dagify() function.

# Load the library
library(ggdag)
set.seed(555)
# Define the DAG structure
dag <- dagify(
  y ~ x,
  x ~ c1 + c2 + c3,
  c1 ~ c2 + c3,
  c1 ~ c3)

# Plot the DAG
ggdag(dag) + theme_dag_grid()
Network graph to identify the relationship between outcome (O), exposure (E) and different risk factors.
Figure 5.3: DAG between outcome (O), exposure (E) and different risk factors.

Here is a simulation of the risk exposure made with the dagitty package to simulate a logistic regression model with a DAG structure. The model estimates the probability of an outcome (O) given exposure (E) to different risk factors, such as C1, C2, and C3 (confounders). The relative risk is calculated based on the estimated probabilities of the outcome given exposure and no exposure to the risk factors.

# Load necessary libraries
library(dagitty)
library(tidyverse)

# Create DAG structure
dag <- dagitty("dag { E -> O
                      C1 -> O
                      C2 -> O
                      C3 -> O }")

dat <- dag %>% tidy_dagitty()
# Generate data
set.seed(123)
n <- 1000
data <- simulateLogistic(dag)

head(data)
#>   C1 C2 C3  E  O
#> 1  1 -1  1  1 -1
#> 2 -1 -1 -1  1  1
#> 3  1  1  1  1  1
#> 4  1  1 -1 -1 -1
#> 5  1  1 -1 -1 -1
#> 6 -1 -1 -1  1  1
# Fit logistic regression model
model <- glm(O ~ E + C1 + C2 + C3,
             data = data,
             family = "binomial")

# Extract estimated probabilities
pr_outcome_exp <- predict(model, type = "response")
pr_outcome_no_exp <- predict(model,
                             newdata = data.frame(E = "1",
                                                  C1 = data$C1,
                                                  C2 = data$C2,
                                                  C3 = data$C3),
                             type = "response")

# Calculate relative risk
relative_risk <- pr_outcome_exp / pr_outcome_no_exp
(a) Histogram of Relative Risk
(b) Density distribution of Relative Risk
Figure 5.4: Histogram and density distribution of Relative Risk.

5.2.4 Theoretical Minimum-Risk Exposure Levels (TMRELs)

Risk factors associated with a particular health condition are considered based on the Theoretical minimum risk exposure levels (TMRELs) and as a function of the risk exposure or relative risk (RR) value. Not all the variables that are thought to be risk factors increasing causes for a particular health condition are always the driving cause of the condition, for this reason a minimum level of risk exposure is established for the risk to be considered involved as effective in the outcome.

Moreover, disease attributable to a particular risk factor or combination of risk factors need to be ascertain investigating the risk-outcome relationship. Risk factors can also act indirectly on the outcome via intermediate risks, such as the association of low fruit consumption and heart disease influenced by systolic blood pressure which acts as mediator between the two.

As an example of risk factors which met the minimum level (TMRELs) are particulate matter air pollution, high systolic blood pressure, and smoking, these factors are expected to be the leading contributors to the overall burden of diseases. Although, it is not always the case when looking closer at a more specific type of disease; the outcome varies based on location, sex, and age.

In terms of DALYs, the level of the disability adjusted life years is highly influenced by behavioral, environmental, and occupational risks.

5.2.5 Population Attributable Fractions (PAFs)

The Population Attributable Fraction (PAF) is a measure used to quantify the proportion of disease incidence in a population that can be attributed to a specific risk factor. It represents the proportion of risk that would be reduced in a given year if the exposure to a risk factor in the past were reduced to an ideal exposure scenario.

PAF is calculated based on the prevalence of the risk factor in the population and the relative risk associated with that risk factor. The formula for calculating PAF is as follows:

PAF = \frac{{P_e \times (RR - 1)}}{{1 + P_e \times (RR - 1)}} \tag{5.3}

Where:

  • P_e is the prevalence of the risk factor in the population.
  • RR is the relative risk associated with the risk factor, representing the increased risk of disease among individuals exposed to the risk factor compared to those who are not exposed.

The PAF ranges from 0% to 100%. A PAF of 0% indicates that the risk factor has no impact on the incidence of the disease, while a PAF of 100% indicates that all cases of the disease in the population can be attributed to the risk factor.

PAF is useful for public health interventions as it provides insight into the potential impact of reducing or eliminating a specific risk factor on the incidence of disease in the population. By targeting interventions to reduce exposure to the risk factor, public health efforts can effectively reduce the burden of disease in the population and improve overall health outcomes.

5.3 Causal Inference

Causality concerns the relationship between two variables, where one variable (the cause) influences or determines the other variable (the effect). For example, regular exercise leads to improved cardiovascular health, or adequate sleep contributes to better cognitive function. However, it’s essential to distinguish causality from correlation, which denotes the statistical association between two variables. For instance, the correlation between watching television and obesity rates does not imply a causal link. To establish causality, one must systematically consider alternative explanations or confounding factors that may influence both the cause and the effect.

As the foundation of prediction, causal inference investigates on the underlining causes of a condition or a phenomenon. These causes might not immediately visible but need to be discovered passing through various steps of data analysis of the observed data to reveal the hidden threads of cause.

Performing casual inference requires a settlement of an experiment, where there are treatment and outcome elements. The treatment is the action taken to the data, in the example of regular exercise leads to improved cardiovascular health, to state that this statement is true, an intervention is done to the data, simulating the intervening of another factor variable on the relationship. The formal name of an intervention is “treatment”, and to investigate the relationship between exercise and the improvement of cardiovascular health, a treatment would be seeing how data can change if regular portions of fruits is eaten.

So, now the elements factor of the investigation are exercise and potion of fruits, and this is done to see how the combination of the two would improve or not cardiovascular health. Once the intervention is analysed looking at how the response variable, in this case cardiovascular health, changed due to treatment, then the next step would be to apply a control procedure to

It is useful, especially when the result of an intervention is uncertain to apply a counter-factual scenario.

5.4 Summarizing the Relationship Between Risk and Outcome

The relationship between risk and outcome in epidemiology is fundamental for understanding the causes and prevention of disease. It involves assessing how exposure to certain risk factors influences the likelihood of developing a particular health outcome. Epidemiological studies examine the association between risk factors and health outcomes, quantifying the strength of this association through measures like relative risk (RR). Relative risk compares the risk of developing the outcome among individuals exposed to the risk factor to the risk among those who are not exposed, with an RR greater than 1 indicating an increased risk associated with exposure. Population Attributable Fraction (PAF) measures the proportion of disease incidence in a population that can be attributed to a specific risk factor, aiding in estimating the potential impact of reducing exposure to the risk factor on overall disease burden. Establishing causality requires demonstrating consistent associations, dose-response relationships, temporal precedence, and ruling out alternative explanations. Ultimately, understanding the relationship between risk factors and outcomes informs preventive strategies, public health interventions, and improves population health outcomes through evidence-based decision-making in public health.

In conclusion, the study of risk and outcome extends beyond traditional epidemiological methods to incorporate advanced techniques like transfer learning. This interdisciplinary approach allows for the application of insights gained from epidemiological research to other fields and viceversa, enhancing our understanding of complex relationships between risk factors and health outcomes. By leveraging learning techniques, we can identify patterns, extract meaningful insights, and develop predictive models that transcend traditional boundaries, offering new perspectives on population health and guiding targeted interventions. Thus, the integration of epidemiology with emerging technologies opens doors to innovative solutions and holistic approaches to address public health challenges and improve overall well-being.


  1. Joanna Bourke, Fear: A Cultural History (Catapult, 2007).↩︎

  2. Ying Li, Thomas Hills, and Ralph Hertwig, “A Brief History of Risk,” Cognition 203 (October 2020): 104344, doi:10.1016/j.cognition.2020.104344.↩︎

  3. Ibid.↩︎

  4. Jeffrey D Stanaway et al., “Global, Regional, and National Comparative Risk Assessment of 84 Behavioural, Environmental and Occupational, and Metabolic Risks or Clusters of Risks for 195 Countries and Territories, 19902017: A Systematic Analysis for the Global Burden of Disease Study 2017,” The Lancet 392, no. 10159 (November 2018): 1923–94, doi:10.1016/s0140-6736(18)32225-6.↩︎