Main Focus

Not all of the available data on COVID-19 have the same degree of reliability.

Some are actually seriously biased:

  • the number of infected people: strongly relies on the number of tests that a city / region / country is able to carry out, so it is largely underestimated;
  • the number of deaths for COVID-19: again underestimated, as not all the dead have been tested for COVID-19;
  • all of the indicators that rely on one or both of the above, for instance mortality rate.

On the other hand, some other figures are definitely less influenced by exogenous factors:

  • the number of patients in ICU (terapia intensiva), as long as there is room for all those in need, is a good indicator of people with really severe symptoms;
  • the number of admissions to ER (pronto soccorso) is another good indicator - but no open data are available as far as I know;
  • overall death trends in time series on large enough areas is definitely the most reliable indicator of the impacts of COVID-19 - the only downside being the long delay from infection in the first place, to final reporting.

I’m focusing on this last point, thanks to the data originally made available by ISTAT on 1st April, updated till 21st March 2020: https://www.istat.it/it/archivio/240401

I’m updating it regularly based on new data releases: last update published on 4th May, with data updated till 15th April 2020.

Data and approach

For 4426 Italian municipalities, the data on daily declared deaths (no matter the cause) is available with the following cross sections:

  • Day: every day for 6 years, from 2015 to 2020 (till 15th April);
  • Gender: male / female
  • Age group: 0, 1 to 5 years old, 6 to 10, 11 to 15, all the way to 96 to 100, and then 101 and more

Instead of relying on highly biased data, I make a simple assumption: in a large enough area (like Bergamo and province, more than 1 million inhabitants) the number of deaths is largely stable and huge anomalies have to be attributed to very specific events, like the COVID-19 epidemic.

Focus on Bergamo and province

I’m focusing on the province of Bergamo, heavily hit by COVID-19:

  • 243 municiapitilies;
  • 171 have provided full data;
  • These 171 include all the top 10 municipalities for number of inhabitants, including the city of Bergamo itself, covering 866’247 of the 1’114’590 inhabitants in the province of Bergamo (coverage: 78%)

A quick look on the largest municipalities included. Inhabitants taken from: http://demo.istat.it/pop2019/index3.html.

Municipalities in the province of Bergamo and availability of statistics on deaths in 2020
MUNICIPALITY INHABITANTS DATA_AVAILABLE
Bergamo 121’639 YES
Treviglio 30’092 YES
Seriate 25’385 YES
Dalmine 23’610 YES
Romano di Lombardia 20’625 YES
Albino 17’805 YES
Caravaggio 16’259 YES
Alzano Lombardo 13’655 YES
Stezzano 13’234 YES
Osio Sotto 12’555 YES
Ponte San Pietro 11’579 YES
Nembro 11’526 YES
Cologno al Serio 11’184 NO
Treviolo 10’890 NO
Martinengo 10’647 NO

Impact of COVID-19 over time

The province of Bergamo has been seriously hit by COVID-19, with the number of deceased rising since the beginning of March.

The graph below confirms my initial assumption about the fact that the registered death are largely constant and the impact of COVID-19 is easily visible.

A better way to see this is plotting the ratio between the number of deaths in 2020, day by day, and the corresponding average between 2015 and 2019.

The shape is slightly different (there’s usually a decreasing trend in daily deaths when moving out of winter), but the overall message is confirmed: the effect of COVID-19 is clearly visible starting from the second week of March and it appears to be slowing down since the beginning of April.

Death toll of different age groups

I decided to focus on the two weeks from 15th to 28th of March: the choice of these weeks (instead of focusing on the whole month of March, for instance) is due to the fact that death occurs roughly 3 weeks (on average) since the infection: around 5-7 days for incubation and two weeks from the first symptoms to death (https://www.worldometers.info/coronavirus/coronavirus-death-rate/#days).

Given the spread of COVID-19 in late February, this appears as the best time frame to analyse in order to understand the actual impacts of the virus on the population during its peak.

During this period, the ratios of deaths (2020 vs. historical data (average over 2015-2019)) for age groups show that the impact is indeed larger for seniors, but it’s definitely present across all of the age groups.

The plot below focuses on the aforementioned weeks, calculating the ratio of deceased in 2020 vs. the average of 2015-2019.

Starting from people 51 and older, the impact of COVID-19 is huge and noticeable as a multiplier that ranges from 5 to 13 (depending on the different age groups) when compared to the average mortality of 2015-2019.

Details in the table below. In this specific area and for the analysed time frame, people in their 70s appear to be particularly hit by COVID-19: for people aged 71 to 80, the deaths skyrocketed from 63.4 to 774. This is roughly 12x.

Deaths in 171 municipalities of Bergamo and province in 2015-2019 vs. 2020, during the two weeks from 15th to 28th of March
MACRO_CLASS AVG 2015-2019 2020 RATIO_2020_VS_HIST DELTA_2020_VS_HIST
0-45 6.2 11 1.77 4.8
46-50 5.0 13 2.60 8.0
51-55 5.4 31 5.74 25.6
56-60 6.6 54 8.18 47.4
61-65 11.4 89 7.81 77.6
66-70 21.6 171 7.92 149.4
71-75 25.6 300 11.72 274.4
76-80 37.8 474 12.54 436.2
81-85 54.4 559 10.28 504.6
86-90 73.2 603 8.24 529.8
91+ 69.8 569 8.15 499.2

From a different perspective, let’s consider:

  • the age distribution of the deaths in 2015-2019;
  • the age distribution of the surplus in 2020, i.e. the deaths in 2020 minus the baseline of 2019.

The plot below shows the distribution normalized within each of the two groups.

Comparing this surplus (likely due to the COVID-19 epidemic) with the baseline, we see some interesting insights. I’ll highlight just a couple:

  • Under 45 years old, the impact is negligible;
  • As seen before, people in their 70s seems much more impacted by COVID-19 than people over 80 years old.

Gender differences

Taking a look at overall mortality (always in Bergamo and province, 15th to 28th of March), gender differences do not appear as large as suggested by the analysis of COVID-19 offical deaths - but are present nevertheless.

Let’s focus on the largest age groups (older than 70).

Details in the table below, for all the classes.

For all age groups over 50, the ratio 2020 vs. 2015-2019 is larger for men than for women.

Increase of overall deaths across age groups, divided by gender
MACRO_CLASS GENDER 2020 AVG 2015-2019 RATIO_2020_VS_HIST DELTA_2020_VS_HIST
0-45 F 5 2.0 2.50 3.0
0-45 M 6 4.2 1.43 1.8
46-50 F 4 1.8 2.22 2.2
46-50 M 9 3.2 2.81 5.8
51-55 F 8 2.4 3.33 5.6
51-55 M 23 3.0 7.67 20.0
56-60 F 10 3.0 3.33 7.0
56-60 M 44 3.6 12.22 40.4
61-65 F 16 4.0 4.00 12.0
61-65 M 73 7.4 9.86 65.6
66-70 F 38 6.8 5.59 31.2
66-70 M 133 14.8 8.99 118.2
71-75 F 87 7.6 11.45 79.4
71-75 M 213 18.0 11.83 195.0
76-80 F 160 15.8 10.13 144.2
76-80 M 314 22.0 14.27 292.0
81-85 F 223 25.6 8.71 197.4
81-85 M 336 28.8 11.67 307.2
86-90 F 301 43.2 6.97 257.8
86-90 M 302 30.0 10.07 272.0
91+ F 410 53.0 7.74 357.0
91+ M 159 16.8 9.46 142.2

Taking a look at the overall picture, women appear indeed to be less affected by COVID-19 than men.

An immediate calculation gives us the following ratios of man vs. woman affected by COVID-19:

  • Men: 57.11%
  • Women: 42.89%

This is way more balanced than official figures, that suggest that 71% of the overall deaths by COVID-19 in Italy are men, as reported by many sources, for instance here: https://www.statista.com/chart/21345/coronavirus-deaths-by-gender/

A comparison across provinces

A quick comparison across different Italian provinces shows that indeed Bergamo has been by far the province that has been hit the most by COVID-19. The multiplier with respect to the previous years touches 10x on some days on the 3rd and 4th week of March, while it’s definitely lower in the other provinces (reaching up to 6x).

It’s interesting to see how Lodi has definitely been the first province to be seriously hit (since the second half of February).

Focusing on provinces different from Bergamo.

A different point of view to interpret the same data. In order to avoid possible reporting mistakes or delays, I’m working on a 7-day centered moving average. Here I highlight the magnitude of the COVID-19 impact (as the maximum multiplier with respect to the deaths registered on the previous years) and I refer to 3 days:

  • First signs as the first day where the number of deaths in the province topped the +50% (i.e 1.5x) with respect to the baseline 2015-2019 - always on a 7-day moving average;
  • Worst day refers to when the highest multiplier has been reached;
  • Closer to normal is the day (if available) when the multiplier has got back to below 1.5x.
Critical days for COVID-19 epidemic across different provinces. Centered moving average over 7 days.
NOME_PROVINCIA MAX_MAGNITUDE FIRST_SIGNS WORST_DAY CLOSER_TO_NORMAL
Bergamo 9.33x 02 mar 23 mar NA
Brescia 5.72x 06 mar 20 mar NA
Cremona 6.67x 01 mar 21 mar NA
Lodi 5.59x 24 feb 14 mar NA
Milano 2.72x 10 mar 28 mar 11 apr
Pavia 3.15x 07 mar 24 mar NA
Piacenza 6.12x 02 mar 20 mar NA

An overview on Lombardy

For each municipality in Lombardy that provided the data, you can see the usual ratio: overall deaths in 2020 vs. average in 2015-2019, for the two weeks 15th to 28th March.

Notes and Conclusions

The approach shown in this analysis doesn’t rely on highly biased data as the number of infected or the number of official deaths by COVID-19 (i.e. people who have been tested).

Instead, it largely focuses on a specific area (Bergamo and province) in two specific weeks (15th to 28th of March), building on the idea that the huge increase of registered deaths in this area, both as compared to 2019 as well as to previous years, is undoubtedly related to COVID-19.

The main outcomes are the following:

  • The impact of COVID-19 is huge: counting all of the municipalities analysed, there had never been a day in the first three months of the last few years with more than 42 deaths. In 2020, we’ve had 11 days with more than 200 deaths a day, roughly 5 times as much - and during the second half of March (15th to 28th), the total of deceased is more than 9 times what was registered in 2019;
  • All age groups over 45 years old are somehow affected by this epidemic: there’s an otherwise inexplicable increase in registered deaths that can be measured as a multiplier with respect to historical data
    • Older people have a higher overall mortality, but this is true in general: the distribution of excess deaths across age groups is not substantially different than what is measurable on historical data;
    • People in their 70s appear to be more affected than people in their 80s or 90s;
  • Gender differences, while definitely present (a larger percentage of men are dying), are probably not as large as they seem to be when looking at official data coming relative to deceased people who tested positive for COVID-19;
  • There’s still a long way to go. Four numbers to understand why:
    • Average number of deaths per day, registered in 2015-2019: 25
    • Max number of deaths per day, registered in 2015-2019: 42
    • Max number of deaths per day, registered in 2020 during COVID-19 epidemic: 245
    • Number of deaths on the last available day (15th April 2020): 36

It’s crucial to have this data updated, in order to do a proper evaluation of this pandemic. And the way this data has been made available by ISTAT (the Italian National Institut of Statistics) could represent a serious way should we aim to a fair cross-national analysis of COVID-19.

Bonus track: an animated map

A map of Lombardy and its 12 provinces, animated over time, give a clear picture on how COVID-19 affected this region, with huge differences across provinces.

While the situation is largely consistent with previous years till mid February across the whole region, the province of Lodi quickly shows the first signs of the COVID-19 before the beginning of March.

From then on, Bergamo and the neighbouring provinces are devasted, with peripheral provinces (Varese in the North West, Mantova in the South East) definitely less impacted. The situation for the last day available is still much worse then the beginning of the time series (in January).