15  COVID-19 Outbreaks

In this chapter, we explore the dynamics of COVID-19 disease outbreaks in more details. We examine the results of various model’s applications that simulate the spread of the virus.

15.1 Epidemiology

COVID-19, short for “Coronavirus Disease 2019,” is an infectious disease caused by the novel coronavirus SARS-CoV-2, a type of virus that is part of the Coronaviridae1 family of viruses that can cause illness in animals and humans alike. The disease was first identified in December 2019 in Wuhan, Hubei province, China, and has since then spread globally, leading to a pandemic2.

The origin of the virus is still under investigation3, but it is believed to have zoonotic4 origins, with bats being the most likely reservoir. The virus has also been linked to pangolins5, mammals that are illegally trafficked for their scales and meat. The virus is thought to have jumped from animals to humans, possibly through a wet market in Wuhan where live animals were sold.

Examples of Zoonotic Viruses:

  1. Influenza (Flu) Viruses: Avian (bird) flu and swine (pig) flu.
  2. Coronavirus: SARS, MERS, and COVID-19 from fruit bats, pangolins.
  3. Ebola Virus: Likely originated from fruit bats.
  4. HIV/AIDS: Believed to have originated from non-human primates (chimpanzees).
  5. Rabies: Spread from infected animals like dogs, bats, and raccoons.

On January 30, 2020, The World Health Organization (WHO) declared the outbreak a Public Health Emergency of International Concern and a worldwide investigation started to understand more about the virus and its effects on humans. This challenge involved the unprecedented identification of this type of virus, requiring thorough testing and confirmation of the incubation period, recovery rate, and infection rate through the collection of cases globally.

The virus spreads primarily through respiratory droplets. Symptoms can include loss of taste and smell, fatigue, muscle aches, and gastrointestinal issues, with fever, cough, and shortness of breath. Severe cases can lead to pneumonia, acute respiratory distress syndrome (ARDS), and death. The emergency was primarily due to the high number of deaths occurring without a known pharmacological intervention to contain it.

Various preventive measures were established, but the presence of asymptomatic and symptomatic cases created challenges in identifying the incubation time, leading governments to impose restrictions through lockdowns of entire cities and regions in affected countries. These measures restricted movement and activities to curb the virus’s transmission.

Some notable examples of lockdown measures include:

  • China: The initial epicenter in Wuhan experienced strict lockdowns, with entire cities quarantined and movement heavily restricted. These measures were effective in curbing the initial spread of the virus.
  • Italy: One of the first European countries severely impacted, Italy imposed nationwide lockdowns in March 2020, closing schools, businesses, and restricting travel.
  • United States: Lockdown measures varied by state, with significant restrictions during the early waves of the pandemic. These included stay-at-home orders, business closures, and mask mandates.
  • India: Implemented one of the world’s largest lockdowns in March 2020, affecting millions of people and including strict travel bans and business closures.
  • Australia: Used localized lockdowns effectively, particularly in cities like Melbourne, which saw extended periods of strict restrictions.
  • United Kingdom: Imposed several lockdowns, including a strict nationwide lockdown in early 2021, along with tiered restrictions based on regional infection rates.

The pandemic has led to widespread healthcare challenges, economic disruptions, and significant changes to daily life, including social distancing measures, lockdowns, and travel restrictions.

Since its identification, COVID-19 has caused multiple outbreaks globally. The incubation period ranges from 2 to 14 days, during which an infected person may be asymptomatic but still contagious. On March 11, 2020, it was declared a pandemic. As of May 2024, there have been over 775 million confirmed cases and more than seven million deaths worldwide. The impact of the pandemic has varied across regions, influenced by factors such as virus variants, public health responses, and vaccination rates.

Public Health COVID-19 Prevention Measures:

  • Diagnostic Tests: PCR (polymerase chain reaction) tests and rapid antigen tests are used to detect active infections, while antibody tests determine past infections.
  • Treatment: While there is no specific antiviral treatment for COVID-19, supportive care, including oxygen therapy and mechanical ventilation, is used for severe cases. Some antiviral medications and therapies have been repurposed or developed specifically for COVID-19.
  • Vaccination: Vaccines have been developed and distributed globally to prevent COVID-19. These vaccines, including mRNA vaccines (Pfizer-BioNTech and Moderna), viral vector vaccines (AstraZeneca and Johnson & Johnson), and inactivated virus vaccines (Sinovac and Sinopharm), have shown high efficacy in preventing severe disease and reducing transmission.

COVID-19 has had a profound impact on global health, economies, and daily life. Efforts to combat the pandemic have involved unprecedented levels of international cooperation, scientific research, and public health initiatives. Ongoing vaccination campaigns and adherence to public health guidelines are critical in controlling the spread of the virus and preventing further outbreaks. However, vaccine hesitancy and resistance, driven by concerns about the speed of development and potential long-term effects, remain significant challenges in achieving widespread vaccination coverage.

15.2 Mapping COVID-19 Outbreaks

Mapping COVID-19 outbreaks helps identify infection hotspots, track the virus’s spread, and inform public health interventions. Geographic Information Systems (GIS) and spatial analysis techniques can visualize and analyze COVID-19 data, such as the number of cases, deaths, and recoveries, at local, regional, and global levels.

15.3 Example: Modelling the Spread of COVID-19

Spatiotemporal modelling of COVID-19 outbreaks can provide valuable insights into the pandemic’s dynamics and help guide public health responses. In this example, we’ll use R to model the spread of COVID-19 using a Susceptible-Infected-Recovered (SIR) model, a common compartmental model used in epidemiology to simulate the spread of infectious diseases. A Bayesian machine learning approach is used to predict the spread of the virus over time and across different regions. Bayesian methods handle uncertainties and allow for the integration of prior information into the model.

First, we use the SIR model function to simulate the spread of the disease. The model compartments are: the number of susceptible (S), the number of infected (I), and R represents the number of recovered individuals. The parameters of the model are the transmission rate (\beta) and the recovery rate (\gamma), and (N) is the total population.

library(deSolve)
library(tidyverse)
# Define parameters
N <- 1e6 # Total population
beta <- 0.5 # Transmission rate
gamma <- 0.1 # Recovery rate
t_exp <- 5 # Latent period

# Initial conditions
init <- c(S = N - 1, E = 1, I = 0, R = 0)

# Define the SEIR model
seir_model <- function(t, y, parameters) {
  with(as.list(y), {
    dS <- -beta * S * I / N
    dE <- beta * S * I / N - (1 / t_exp) * E
    dI <- (1 / t_exp) * E - gamma * I
    dR <- gamma * I
    return(list(c(dS, dE, dI, dR)))
  })
}

As seen previously, we’ll use the ode() function from the deSolve package to solve the ordinary differential equations (ODEs) and simulate the spread of the virus over a period of 365 days:

# Solve the ODEs
times <- seq(0, 365, by = 1)
result <- ode(y = init, times = times, func = seir_model)

# Plot the results
ggplot(as.data.frame(result), aes(time, I, color = "Infected")) +
  geom_line() +
  geom_line(aes(time, R, color = "Recovered")) +
  geom_line(aes(time, S, color = "Susceptible")) +
  geom_line(aes(time, E, color = "Exposed")) +
  scale_color_discrete(name = "Compartment") +
  labs(x = "Time (days)", y = "Number of individuals")
Simulation of the SEIR model for COVID-19
Figure 15.1: Simulation of the SEIR model for COVID-19

In reality, the spread of a virus is much more complex and influenced by many factors such as human behavior, government policies, healthcare systems, vaccination campaigns, and human contact.

15.3.1 Bayesian Analysis

The use of Bayesian analysis in modelling infectious diseases, such as COVID-19, offers several advantages in understanding and predicting the dynamics of the spread. We implement a Bayesian regression model in R using the brms package, which interfaces with Stan for Bayesian analysis.

The incorporation of prior knowledge allows for the integration of expert insights, which is particularly valuable for new viruses where historical data may be limited. Expert knowledge can improve model accuracy and reduce uncertainty in the estimates.

We can specify the prior distributions for beta and gamma using expert knowledge and available data. For example, we might specify a gamma distribution with a mean of 0.1 and a standard deviation of 0.05 for beta, and a gamma distribution with a mean of 0.05 and a standard deviation of 0.02 for gamma.

library(brms)
# Load the data
data <- as.data.frame(result)

# Convert simulate data to have the date column to Date type and add additional predictors
data <- data %>%
  mutate(date = as.Date(time, origin = "2022-01-01"),
         day_of_week = wday(date, label = TRUE),
         week_of_year = week(date),
         month = month(date),
         lag_1_cases = lag(I, 1),
         lag_7_cases = lag(I, 7)) %>%
  drop_na()
# Define the Bayesian regression model
bayesian_model <- brm(
  I ~ day_of_week + week_of_year + month + lag_1_cases + lag_7_cases,
  data = data,
  family = gaussian(),
  prior = c(set_prior("normal(0, 10)", class = "b"),
            set_prior("cauchy(0, 5)", class = "sigma")),
  iter = 2000,
  warmup = 1000,
  chains = 4,
  seed = 42
  )

After fitting the model, we can plot the model diagnostics to assess its performance:

# Plot the model diagnostics
plot(bayesian_model)
Figure 15.2: Bayesian Model diagnostics

To make predictions, we extract the posterior samples from the model and apply them to the original data:

# Make predictions
predictions <- predict(bayesian_model, newdata = data)
predictions <- as_tibble(predictions)
data <- data %>%
  mutate(predicted_cases = predictions$Estimate)

# Plot actual vs. predicted values
ggplot(data, aes(x = date)) +
  geom_point(aes(y = I, 
                 color = "Actual")) +
  geom_line(aes(y = predicted_cases,
                color = "Predicted")) +
  labs(title = "Actual vs. Predicted COVID-19 Cases",
       x = "Date", y = "Cases", color = "Legend") 
Actual vs. Predicted COVID-19 Cases
Figure 15.3: Actual vs. Predicted COVID-19 Cases

15.3.2 Ensemble Modelling - Combining Multiple Models

Ensemble modelling combines the predictions of multiple models to improve the overall accuracy and robustness of predictions. This technique is particularly useful in infectious disease modelling to account for uncertainties and variability in data, providing more reliable estimates of disease spread.

library(caret)
library(randomForest)
library(xgboost)
library(e1071)

# Scale the cases data
data <- data %>%
  mutate(cases_scaled = scale(I))

# Split the data into training and testing sets
set.seed(42)
train_index <- createDataPartition(data$cases_scaled, p = 0.8, list = FALSE)
train_data <- data[train_index, ]
test_data <- data[-train_index, ]

# Define the predictors for the model
predictors <- c("day_of_week", "week_of_year", 
                "month", "lag_1_cases", "lag_7_cases")

In summary, integrating machine learning models and advanced statistical techniques such as Bayesian analysis and ensemble modelling can significantly enhance our ability to predict and understand the spread of infectious diseases like COVID-19. These models provide critical insights that can inform public health responses and help mitigate the impact of pandemics.

15.4 COVID-19 & DALYs

The influence of COVID-19 on global health can be measured using the DALYs. We use the data from the JHU official GitHub repository CSSEGISandData6 to calculate the DALYs for COVID-19. The data includes the number of confirmed cases, deaths, and recovered cases for each country and region. We will calculate the DALYs for the US, China, United Kingdom, and Canada.

cases1 <- cases %>%
  pivot_longer(cols = starts_with("X"),
               names_to = "date",
               values_to = "cases") %>%
  mutate(date = gsub("X", "", date),
         date = as.Date(date, format = "%m.%d.%y")) %>%
  janitor::clean_names()

cases1 %>% head()
#> # A tibble: 6 × 6
#>   province_state country_region   lat  long date       cases
#>   <chr>          <chr>          <dbl> <dbl> <date>     <int>
#> 1 ""             Afghanistan     33.9  67.7 2020-01-22     0
#> 2 ""             Afghanistan     33.9  67.7 2020-01-23     0
#> 3 ""             Afghanistan     33.9  67.7 2020-01-24     0
#> 4 ""             Afghanistan     33.9  67.7 2020-01-25     0
#> 5 ""             Afghanistan     33.9  67.7 2020-01-26     0
#> 6 ""             Afghanistan     33.9  67.7 2020-01-27     0
deaths1 <- deaths %>%
  pivot_longer(cols = starts_with("X"),
               names_to = "date",
               values_to = "deaths") %>%
  mutate(date = gsub("X", "", date),
         date = as.Date(date, format = "%m.%d.%y")) %>%
  janitor::clean_names()

deaths1 %>% head()
#> # A tibble: 6 × 6
#>   province_state country_region   lat  long date       deaths
#>   <chr>          <chr>          <dbl> <dbl> <date>      <int>
#> 1 ""             Afghanistan     33.9  67.7 2020-01-22      0
#> 2 ""             Afghanistan     33.9  67.7 2020-01-23      0
#> 3 ""             Afghanistan     33.9  67.7 2020-01-24      0
#> 4 ""             Afghanistan     33.9  67.7 2020-01-25      0
#> 5 ""             Afghanistan     33.9  67.7 2020-01-26      0
#> 6 ""             Afghanistan     33.9  67.7 2020-01-27      0
COVID19 <- cases1 %>%
  left_join(deaths1, 
            by = c("province_state", 
                   "country_region", 
                   "lat", "long", "date")) %>%
  mutate(cases = ifelse(is.na(cases), 0, cases),
         deaths = ifelse(is.na(deaths), 0, deaths)) %>%
  group_by(country_region, date) %>%
  mutate(cases = sum(cases),
         deaths = sum(deaths),
         cfr = round(deaths / cases, 3)) %>%
  filter(cases > 0) %>%
  arrange(country_region, date)

COVID19 %>%
  arrange(desc(date)) %>%
  head()
#> # A tibble: 6 × 8
#> # Groups:   country_region, date [6]
#>   province_state country_region   lat  long date        cases deaths   cfr
#>   <chr>          <chr>          <dbl> <dbl> <date>      <int>  <int> <dbl>
#> 1 ""             Afghanistan     33.9 67.7  2023-03-09 209451   7896 0.038
#> 2 ""             Albania         41.2 20.2  2023-03-09 334457   3598 0.011
#> 3 ""             Algeria         28.0  1.66 2023-03-09 271496   6881 0.025
#> 4 ""             Andorra         42.5  1.52 2023-03-09  47890    165 0.003
#> 5 ""             Angola         -11.2 17.9  2023-03-09 105288   1933 0.018
#> 6 ""             Antarctica     -71.9 23.3  2023-03-09     11      0 0
COVID19_4countries <- COVID19 %>%
  filter(country_region %in%
    c("US", "China", "United Kingdom", "Canada"))

COVID19_4countries %>% head()
#> # A tibble: 6 × 8
#> # Groups:   country_region, date [1]
#>   province_state   country_region   lat   long date       cases deaths   cfr
#>   <chr>            <chr>          <dbl>  <dbl> <date>     <int>  <int> <dbl>
#> 1 Alberta          Canada          53.9 -117.  2020-01-23     2      0     0
#> 2 British Columbia Canada          53.7 -128.  2020-01-23     2      0     0
#> 3 Diamond Princess Canada           0      0   2020-01-23     2      0     0
#> 4 Grand Princess   Canada           0      0   2020-01-23     2      0     0
#> 5 Manitoba         Canada          53.8  -98.8 2020-01-23     2      0     0
#> 6 New Brunswick    Canada          46.6  -66.5 2020-01-23     2      0     0

Selected countries are the US, China, United Kingdom, and Canada.

COVID19 %>%
  ggplot(aes(x = date, y = cases,
  group = country_region)) +
  geom_line(color = "grey", 
            linewidth = 0.2) +
  geomtextpath::geom_textline(
  data = COVID19_4countries,
  aes(label = country_region),
      show.legend = F) +
  scale_y_log10() +
  labs(title = "COVID-19 Cases by Country",
       x = "Date",
       y = "Cases") 

COVID19 %>%
  ggplot(aes(x = date, y = deaths,
             group = country_region)) +
  geom_line(color = "grey", 
            linewidth = 0.2) +
  geomtextpath::geom_textline(
    data = COVID19_4countries,
    aes(label = country_region),
        show.legend = F) +
  scale_y_log10() +
  labs(title = "COVID-19 Deaths by Country",
       x = "Date",
       y = "Deaths") 
COVID-19 Cases and Deaths by Country
(a) COVID-19 Cases by Country
COVID-19 Cases and Deaths by Country
(b) COVID-19 Deaths by Country
Figure 15.4: COVID-19 Cases and Deaths by Country
worldmap <- map_data("world") %>%
  filter(!region == "Antarctica")

centroids <- COVID19 %>%
  group_by(country_region, lat, long) %>%
  reframe(tot_cases = sum(cases),
          tot_deaths = sum(deaths),
          avg_cfr = round(mean(cfr), 3)) %>%
  distinct()
options(scipen = 999)
ggplot() +
  geom_polygon(data = worldmap,
               aes(x = long, y = lat, group = group),
               fill = "grey",
               color = "white") +
  geom_point(data = centroids,
             aes(x = long, y = lat,
             size = tot_cases,
             color = log(avg_cfr)), 
             alpha = 0.5) +
  scale_size_continuous(
        labels = scales::unit_format(unit = "M", 
                                     scale = 1e-6)) +
  scale_color_viridis_c() +
  coord_quickmap() +
  labs(title = "COVID-19 Cases by Country",
       caption = "Circle size: total cases, Color: average CFR", 
      size= "Total cases (M)",
      color= "Average CFR (log scale)",
      x = "", y = "") +
  theme(axis.text = element_blank(),
        plot.subtitle = element_text(size = 8))
COVID-19 Cases by Country
Figure 15.5: COVID-19 Cases by Country

The same graph made with selected countries only.

ggplot() +
  geom_polygon(data = worldmap,
               aes(x = long, y = lat, group = group),
               fill = "grey",
               color = "white") +
  geom_point(data = centroids %>%
             filter(country_region %in%
             c("US", "China","United Kingdom", "Canada")),
             aes(x = long, y = lat,
                 size = tot_cases,
                 color = log(avg_cfr)), 
                 alpha = 0.5) +
  scale_size_continuous(
            labels = scales::unit_format(unit = "M", 
                                         scale = 1e-6)) +
  scale_color_viridis_c() +
  coord_quickmap() +
  labs(title = "COVID-19 Cases by Country",
       caption = "Circle size: total cases, Color: average CFR",
       subtitle = "US, China, United Kingdom, Canada",
       size= "Total cases (M)",
       color= "Average CFR (log scale)",
       x = "", y = "") +
  theme(axis.text = element_blank(),
        plot.subtitle = element_text(size = 8))
COVID-19 Cases by Country
Figure 15.6: COVID-19 Cases by Country

The difference in the number of cases and deaths between the US and China is striking. The US has the highest number of cases and deaths, while China has the lowest number of cases and deaths. The United Kingdom and Canada have a similar number of cases and deaths, but the United Kingdom has a higher case fatality rate (CFR) than Canada.

Let’s now calculate the DALYs for COVID-19, starting with the YLLs. We know that the simplified formula for YLLs is the number of deaths multiplied by the standard life expectancy at the age of death. The standard life expectancy for these countries varies, but we can use the global average of 72.6 years.

Also, we group the date by 4 months cycles to calculate the YLLs.

COVID19_yll_4months <- COVID19_4countries %>%
  mutate(month = as.numeric(format(date, "%m")),
         year = as.numeric(format(date, "%Y")),
         month_cycle = case_when(month %in% 1:4 ~ "Jan-Apr",
                                 month %in% 5:8 ~ "May-Aug",
                                 month %in% 9:12 ~ "Sep-Dec")) %>%
  group_by(country_region, year, month_cycle) %>%
  summarise(cases = sum(cases),
            deaths = sum(deaths),
            cfr = round(deaths / cases, 3)) %>%
  mutate(ylls = deaths * 72.6) %>%
  arrange(year, month_cycle, ylls)

COVID19_yll_4months %>%
  select(year, month_cycle, ylls) %>%
  head()
#> # A tibble: 6 × 4
#> # Groups:   country_region, year [4]
#>   country_region  year month_cycle        ylls
#>   <chr>          <dbl> <chr>             <dbl>
#> 1 Canada          2020 Jan-Apr       62225750.
#> 2 US              2020 Jan-Apr       80674499.
#> 3 China           2020 Jan-Apr      655246654.
#> 4 United Kingdom  2020 Jan-Apr      800881092 
#> 5 US              2020 May-Aug     1158612365.
#> 6 Canada          2020 May-Aug     1165061568

And then plot the YLLs by country and month cycle, with the y axis representing the YLLs formatted in millions.

COVID19_yll_4months %>%
  ggplot() +
  geom_col(aes(x = month_cycle, y = ylls, 
               fill = country_region)) +
  scale_y_continuous(
         labels = scales::unit_format(unit = "M", 
                                      scale = 1e-6)) +
  labs(title = "YLLs by Country and Month Cycle",
       fill = "Country",
       x = "4-Month cycle",
       y = "YLLs") +
  facet_wrap(~year) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
YLLs by Country and Month Cycle
Figure 15.7: YLLs by Country and Month Cycle

The majority of YLLs are in the United Kingdom, followed by the US, Canada, and China. The YLLs are highest in the first 4-month cycle (Jan-Apr) and decrease in the following cycles.

Let’s now calculate the YLDs for COVID-19. The YLDs are calculated by multiplying the number of cases by the disability weight. The Disability Weight applied is that for low respiratory diseases which is 0.133. As of 2020, the Global Burden of Disease Study 2019 estimated that the DALYs for lower respiratory diseases was 1,000,000. We can use the hmsidwR package to get the disability weight for lower respiratory diseases.

hmsidwR::disweights %>%
  filter(str_detect(sequela, "lower respiratory")) %>%
  select(sequela, dw)
#> # A tibble: 4 × 2
#>   sequela                                   dw
#>   <chr>                                  <dbl>
#> 1 Moderate lower respiratory infections 0.051 
#> 2 Severe lower respiratory infections   0.133 
#> 3 Moderate lower respiratory infections 0.0506
#> 4 Severe lower respiratory infections   0.133

We know that the simplified formula for YLDs is the prevalence of the disease multiplied by the disability weight. To calculate the prevalence, we need the population of each country. For this task we use the wpp2022 package, which provides the population data from the 2022 revision of the United Nations World Population Prospects (WPP 2022). It includes both historical estimates and projections of demographic indicators such as population counts and fertility rates, accommodating the challenges brought on by events like the COVID-19 pandemic and transitioning from five-year to one-year data intervals.

# library(devtools)
# options(timeout = 600)
# install_github("PPgp/wpp2022")
library(wpp2022)
# data(package = "wpp2022")
data(pop1dt)
pop_4countries <- pop1dt %>%
  filter(
    year %in% c(2020, 2021),
    name %in% c(
      "United States of America",
      "China",
      "United Kingdom",
      "Canada")) %>%
  mutate(name = ifelse(name == "United States of America", "US", name)) %>%
  select(name, year, pop) %>%
  arrange(year, name)

pop_4countries %>% head()
#>              name  year        pop
#>            <char> <int>      <num>
#> 1:         Canada  2020   38019.18
#> 2:          China  2020 1425861.54
#> 3:             US  2020  336495.77
#> 4: United Kingdom  2020   67167.77
#> 5:         Canada  2021   38290.85
#> 6:          China  2021 1425925.39

Calculate the prevalence of COVID-19 cases per 1,000,000 people.

prevalence = \frac{cases}{population} * 1,000,000

COVID19_dalys <- COVID19_yll_4months %>%
  full_join(pop_4countries, 
            by = c("country_region" = "name", "year")) %>%
  mutate(prevalence = cases / pop * 1000000,
         dw = 0.133,
         ylds = prevalence * dw,
         dalys = ylls + ylds) %>%
  arrange(year, month_cycle, prevalence) %>%
  select(country_region, year, 
         month_cycle, ylls, ylds, dalys)


COVID19_dalys %>%
  head()
#> # A tibble: 6 × 6
#> # Groups:   country_region, year [4]
#>   country_region  year month_cycle        ylls       ylds       dalys
#>   <chr>          <dbl> <chr>             <dbl>      <dbl>       <dbl>
#> 1 US              2020 Jan-Apr       80674499.   8342075.   89016575.
#> 2 China           2020 Jan-Apr      655246654.  21433596.  676680249.
#> 3 Canada          2020 Jan-Apr       62225750.  60966727.  123192477.
#> 4 United Kingdom  2020 Jan-Apr      800881092  107263600.  908144692.
#> 5 China           2020 May-Aug     1434752563.  34879873. 1469632436.
#> 6 US              2020 May-Aug     1158612365. 153456778. 1312069143.

If we calculate the YLDs with the incidence instead of prevalence, we get the same results, because the incidence is the number of new cases in a given period, while the prevalence is the number of existing cases in a given period. So, we should consider the cumulative number of cases to calculate the prevalence. In order to calculate the cumulative number of cases, we need to group the data by year and month cycle.

And then plot the YLDs by country and month cycle, with the y axis representing the YLDs formatted in millions.

COVID19_dalys %>%
  filter(year %in% c(2020, 2021)) %>%
  ggplot() +
  geom_col(aes(x = month_cycle, y = ylds, 
               fill = country_region)) +
  scale_y_continuous(
            labels = scales::unit_format(unit = "M", 
                                         scale = 1e-6)) +
  labs(title = "YLDs by Country and Month Cycle",
       fill = "Country",
       x = "4-Month Cycle", y = "YLDs") +
  facet_wrap(~year) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
YLDs by Country and Month Cycle
Figure 15.8: YLDs by Country and Month Cycle

We can see that the YLDs are highest in the United Kingdom, followed by the US, Canada, and China. The YLDs are highest in the first 4-month cycle (Jan-Apr) and decrease in the following cycles.

Finally, let’s plot the DALYs by country and month cycle, with the y axis representing the DALYs formatted in millions.

COVID19_dalys %>%
  filter(year %in% c(2020, 2021)) %>%
  ggplot() +
  geom_col(aes(x = month_cycle, y = dalys, 
               fill = country_region)) +
  scale_y_continuous(
            labels = scales::unit_format(unit = "M", 
                                         scale = 1e-6)) +
  labs(title = "DALYs by Country and Month Cycle",
       fill = "Country",
       x = "4-Month Cycle", y = "DALYs") +
  facet_wrap(~year) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
DALYs by Country and Month Cycle
Figure 15.9: DALYs by Country and Month Cycle

To explain the magnitude of difference between ylls and ylds,

prevalence as the proportion of cases at baseline among the total number of enrolled participants who completed baseline visits. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8992269/)

cumulative incidence as the number of incident cases divided by the total follow-up time per 100 person-years and assumed a uniform incidence distribution across the 6-month follow-up time. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8992269/)


  1. “Coronaviridae - Wikipedia,” n.d., https://en.wikipedia.org/wiki/Coronaviridae.↩︎

  2. “Pandemic - Wikipedia,” n.d., https://en.wikipedia.org/wiki/Pandemic.↩︎

  3. “WHO-Convened Global Study of Origins of SARS-CoV-2: China Part,” n.d., https://www.who.int/publications/i/item/who-convened-global-study-of-origins-of-sars-cov-2-china-part.↩︎

  4. “Zoonosis - Wikipedia,” n.d., https://en.wikipedia.org/wiki/Zoonosis.↩︎

  5. Xing-Yao Huang et al., “A Pangolin-Origin SARS-CoV-2-Related Coronavirus: Infectivity, Pathogenicity, and Cross-Protection by Preexisting Immunity,” Cell Discovery 9, no. 1 (June 17, 2023): 1–13, doi:10.1038/s41421-023-00557-9.↩︎

  6. cases <- read.csv(“https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv”)) deaths <- read.csv(”https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv”))↩︎