14  Introduction to Infectious Diseases

“Several infectious diseases are emerging and threatening human health worldwide. The burden of infectious diseases is undeniably a global issue, causing millions of deaths annually.”1

The advent of machine learning (ML) has revolutionized the field of infectious disease research by providing robust tools for predicting outbreaks and understanding the dynamics of spread. In this chapter, we will enhance our understanding of these diseases and look at the effects on Disability-Adjusted Life Years (DALYs) by applying machine learning and data visualization techniques learned in previous chapters (Chapter 6 and Chapter 10). To further improve the knowledge of the impact of infectious diseases on global health, we will explore how integrating ML models can improve the accuracy of disease burden estimations and provide valuable insights into the impact of infectious diseases on public health.

14.1 Infectious Diseases the Invisible Enemies

Emerging infectious diseases are a global concern, causing millions of deaths annually. Understanding their behavior and predicting outbreaks is fundamental not only for public health but also for advancing prediction techniques that can be applied in other fields.

Microorganisms adapt much more rapidly than humans. A bacterial generation can be as short as 20-30 minutes, and for viruses, it’s even shorter. This rapid adaptation allows pathogens to evolve quickly, developing resistance to treatments and evading the host’s immune system.

Once the infective agent begins to thrive and multiply throughout the body,2 the process of infection starts. The rate at which the pathogen proliferates varies significantly depending on the type of organism involved. Each infectious disease has a unique incubation period, which is the interval between the initial establishment of the pathogen in the host and the onset of symptoms.

The incubation period can range from a few hours to several months, influenced by factors such as the pathogen’s growth rate, the host’s immune response, and the route of transmission. For example, the incubation period for the influenza virus is typically 1 to 4 days, whereas for diseases like hepatitis B, it can be as long as 6 months. Understanding the incubation period helps in identifying the time frame for potential exposure.

Several factors influence infection, including:

  • The dose of the infection (quantity of invading germs)
  • The virulence of the infection
  • The condition of the body’s immune system
  • Contact with the source of infection for contagious diseases

Who adapts to whom?

Viruses, named from the Latin word for “poisonous substance,” range from 20 to 400 nm in diameter and can only be observed with an electron microscope. Outside of a living cell, a virus is a dormant particle of various shapes. Once inside a cell, it replicates, often killing the cell or altering its functions.

The following seven diseases are all caused by an infectious agent such as virus or bacteria, generally cause acute symptoms, ranging from mild to severe, and require prompt medical attention to prevent complications and further spread:

  1. Acute Respiratory Infection (ARI)
  2. COVID-19
  3. Dengue
  4. Influenza/Influenza-Like Illness (ILI)
  5. Malaria
  6. West Nile Virus
  7. Zika

These diseases are transmitted through various means and can be grouped by transmission methods:

  • Vector-Borne: Dengue, Malaria, West Nile Virus, and Zika are primarily spread through mosquito bites.
  • Respiratory Droplets: ARI, COVID-19, and Influenza/ILI are transmitted via respiratory droplets when infected individuals cough or sneeze.

14.2 The SIR Model

The application of mathematical models to infectious diseases dates back over a century, with significant contributions from pioneers such as Kermack and McKendrick, who established the foundations of the subject3. Their work introduced the concept of categorizing individuals based on their epidemiological status: susceptible, infected, and recovered.

One of the simplest and most fundamental epidemiological models, the SIR model, to which we had a quick look in the previous chapters (Chapter 6 and Chapter 7), is based on these three compartments. The model estimates two key parameters: the infection rate and the recovery rate. These parameters help predict the epidemic’s progression, showing how the number of susceptible individuals decreases as the number of infected individuals increases, eventually leading to recovery and a decline in new infections, as shown in Equation 6.1.

The Reproduction Ratio (R0)

To understand the epidemic potential of an infectious disease, the average number of secondary cases generated by a single infectious individual in a completely susceptible population is calculated using the reproduction ratio (R0). As the ratio of the transmission rate to the recovery rate, R0 provides a measure of the disease’s ability to spread.

R0 = \frac{\beta}{\gamma} \tag{14.1}

where \beta is the transmission rate and \gamma is the recovery rate. The value of R0 determines the epidemic threshold:

\text{If R0 = } \left\{\begin{matrix} \begin{aligned} >1 = & \text{Epidemic}\\ <1 =& \text{End of Infection Transmission} \end{aligned} \end{matrix}\right. \tag{14.2}

To accounts for changes in the population’s immunity, the effective reproduction number (R_{eff}) is calculated on a susceptible population which is not completely susceptible, and value of R_{eff} results less than R0 due to the presence of immune individuals in the population.

Another critical concept in infectious disease is the herd immunity, which refers to the indirect protection from infectious diseases that occurs when a large percentage of a population becomes immune to the infection, either through vaccination or previous infections. The herd immunity is reached when the effective reproduction number is less than 1, and the disease stops spreading.

The SIR model shows the dynamics of an epidemic by looking at how it grows and eventually declines. Initially, the number of cases rises exponentially, leading to a peak, but as the susceptible population start reducing in number due to various factors, the growth rate slows with subsequent decline.

14.2.1 Advancements and Extensions

Mathematical modelling has evolved to include more complex factors such as age structure, stochasticity, and spatial dynamics. Age-structured models, for example, consider how different age groups interact and contribute to the spread of diseases, which is particularly important for diseases like measles or COVID-19. Stochastic models account for random events that can influence the course of an epidemic, such as the introduction of the disease into a new population.

The use of machine learning algorithms such as decision trees, random forests, support vector machines, and deep-learning networks such as Long short-term memory (LSTM) models, effectively improve the identification of patterns and trends that may not be obvious with mechanistic type of models. These models are able to improve prediction accuracy working smoothly with large datasets.

Combining models and data sources enhances prediction accuracy, various models and techniques can be applied to reduce bias and the risk of overfitting. For instance, ensemble learning combines the predictions of multiple models to improve accuracy. In this context, we will explore how machine learning can predict infectious disease outbreaks and their impacts on human health, ultimately aiming to reduce the burden of disease.

Another significant aspect to consider is the emerging use of transfer learning, which involves applying knowledge gained from one predictive task to another. This approach is especially useful when data is limited and models need to be adapted. Although relatively under-explored in infectious disease research, transfer learning holds significant promise for improving predictions in areas with scarce data. By leveraging information from related tasks, this technique can enhance model performance, leading to more accurate and reliable predictions in public health scenarios.4

14.3 The Impact on DALYs

To understand the magnitude of infectious diseases impacts on DALYs, we can simply consider the DALYs rate of change. The percentage change in total DALYs and DALYs due to infectious diseases, in general or for a specific infective virus such as COVID19, allows us to assess the impact on the overall burden of disease. In the case of COVID19 for example, the percentage change in DALYs due to COVID19 can explain how this virus affected global health and produced excess of mortality and morbidity.

\text{Percent change in DALYs} = \frac{\text{DALYs due to Infectious Diseases}}{\text{Total DALYs}} \times 100 \tag{14.3}

Where the DALYs = \sum_{i=1}^{n}{(YLD_i + YLL_i)}, YLD and YLL are the years lived with disability and the years of life lost respectively.

This percentage change provides a measure of the impact of infectious diseases on the overall burden of disease.

Furthermore, the use machine learning models is used to predict the variation of number of DALYs due to infectious diseases over time. Two approaches can be valued:

  1. DALYs as a function of the socio-demographic index (SDI): A composite index of the average income per person, educational attainment, and total fertility rate. The model function can be expressed as: DALY_{id}= f(SDI)+\epsilon \tag{14.4} where is the number of DALYs is the response variable, SDI the socio-demographic index acting as predictor, f(.) is the function that relates the number of DALYs to the socio-demographic index, and \epsilon is the error term.

  2. DALYs as a function of the human development index (HDI): A composite index of life expectancy, education, and per capita income indicators. The model function can be expressed as:DALY_{id}= f(HDI)+\epsilon \tag{14.5} where is the number of DALYs is the response variable, HDI the socio-demographic index acting as predictor, f(.) is the function that relates the number of DALYs to the socio-demographic index, and \epsilon is the error term. Big data analytics with machine learning analysis are used to classify the patterns of global disease burden by human development index (HDI) to have a better understanding of DALYs caused by infectious diseases such as COVID19 given different levels of HDI.

This can help us to understand the trends and patterns of infectious diseases and their impact on global health.


  1. Omar Enzo Santangelo et al., “Machine Learning and Prediction of Infectious Diseases: A Systematic Review,” Machine Learning and Knowledge Extraction 5, no. 1 (March 2023): 175–98, doi:10.3390/make5010013.↩︎

  2. Lyle D. Broemeling, Bayesian Analysis of Infectious Diseases: COVID-19 and Beyond (New York: Chapman; Hall/CRC, 2021), doi:10.1201/9781003125983.↩︎

  3. M. J. Keeling and L. Danon, “Mathematical Modelling of Infectious Diseases,” British Medical Bulletin 92, no. 1 (December 1, 2009): 33–42, doi:10.1093/bmb/ldp038.↩︎

  4. Kirstin Roster, Colm Connaughton, and Francisco A. Rodrigues, “Forecasting New Diseases in Low-Data Settings Using Transfer Learning,” Chaos, Solitons, and Fractals 161 (August 2022): 112306, doi:10.1016/j.chaos.2022.112306.↩︎