Chapter 3 Missing data

Basic analysis of missing data

Missing data overview

The analysis of missing data reveals critical gaps on some indicators and years.

Number of countries/economies with collected data:

## [1] 194

Some countries include only basic economic indicators such as GDP per capita. What rule should we take to decide which series will be imputed? What is a good amount of missing values?

  • Countries with at least 3 data points available by series:
## [1] 74
  • Countries with at least 2 data points available by series:
## [1] 114
  • Countries with at least 1 data point available by series:
## [1] 132

Number of series fully missing by country:

##    ISO3 count_NA
## 1   FJI        1
## 2   GNB        1
## 3   JAM        1
## 4   KHM        1
## 5   MKD        1
## 6   MLT        1
## 7   NZL        1
## 8   PNG        1
## 9   QAT        1
## 10  SGP        1
## 11  STP        1
## 12  SWZ        1
## 13  SYR        1
## 14  TLS        1
## 15  AFG        2
## 16  BHR        2
## 17  CPV        2
## 18  DJI        2
## 19  GUY        2
## 20  HTI        2
## 21  KWT        2
## 22  OMN        2
## 23  SAU        2
## 24  SLB        2
## 25  SOM        2
## 26  TTO        2
## 27  YEM        2
## 28  ZMB        2
## 29  BHS        3
## 30  BLZ        3
## 31  BRB        3
## 32  GNQ        3
## 33  LCA        3
## 34  SUR        3
## 35  TON        3
## 36  VEN        3
## 37  VUT        3
## 38  WSM        3
## 39  BRN        4
## 40  LBY        4
## 41  SSD        4
## 42  TKM        4
## 43  VCT        4
## 44  CUB        5
## 45  PSE        5
## 46  HKG        6
## 47  ERI        8
## 48  MAC       10
## 49  SYC       13
## 50  DMA       14
## 51  KIR       14
## 52  GRD       15
## 53  MHL       17
## 54  PLW       17
## 55  TUV       18
## 56  ABW       20
## 57  GUM       20
## 58  AND       21
## 59  BMU       21
## 60  CYM       21
## 61  SMR       22
## 62  CUW       24

Countries with only one missing data series. Which one is missing?

##    ISO3    Series_missing
## 1   FJI      FB_BNK_ACCSS
## 2   GNB      FB_BNK_ACCSS
## 3   JAM       MAR_AGE_MAL
## 4   KHM       SI.POV.LMIC
## 5   MKD       MAR_AGE_MAL
## 6   MLT NY.ADJ.NNTY.PC.KD
## 7   NZL       SI.POV.LMIC
## 8   PNG      FB_BNK_ACCSS
## 9   QAT             PALMA
## 10  SGP       SI.POV.LMIC
## 11  STP      FB_BNK_ACCSS
## 12  SWZ    LP.LPI.OVRL.XQ
## 13  SYR       EG_EGY_PRIM
## 14  TLS      FB_BNK_ACCSS

By indicator

Indicators with lowest coverage include:

  • Indicator 2.7: Proportion of adults (15 years and older) with an account at a financial institution or with a mobile money-service company.
  • Indicator 3.8: Average marriage age by sex
  • Indicator 2.1: Logistics performance indicator
  • Indicator 2.6: Universal health coverage - only available for a few years.

By country

Large economies such as China and India were not included in previous editions due to missing data. However, data availability and coverage improved for both countries to be included.

3.0.1 China

China was not included in the SDG Pulse analysis. What is the situation with missing data?

  • PALMA ratio available for 2 data points

3.0.2 India

India was not included in the SDG Pulse analysis. What is the situation with missing data?

  • PALMA ratio available for 3 data points

3.0.3 Other most populated countries (United States, Indonesia, Pakistan, Nigeria)

All included in the SDG Pulse 2022

To see other countries, data availability by country is available on SharePoint.

Comparison between collected and imputed data by country is available on SharePoint.

SDG Pulse 2022

Countries included in the SDG Pulse 2022

## [1] 97

Countries selected based on the rule of “at least 2 data points available” but not included in the SDG Pulse 2022.

##  [1] "Algeria"            "Australia"          "Myanmar"            "El Salvador"       
##  [5] "Djibouti"           "Jordan"             "Nigeria"            "Paraguay"          
##  [9] "Russian Federation" "Singapore"          "Thailand"           "Türkiye"           
## [13] "Uruguay"

Countries selected based on the rule of “at least 1 data point available” but not included in the SDG Pulse 2022.

## [1] "Djibouti"  "Singapore"

Proposed decision rule:

  • take countries with “at least 1 data point available”
  • check countries included in the SDG Pulse
  • all problematic countries resolved except of Singapore

Countries to investigate

## [[1]]
## Warning: Removed 209 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[2]]
## Warning: Removed 232 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[3]]
## Warning: Removed 246 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[4]]
## Warning: Removed 151 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[5]]
## Warning: Removed 287 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[6]]
## Warning: Removed 204 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[7]]
## Warning: Removed 217 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[8]]
## Warning: Removed 170 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[9]]
## Warning: Removed 184 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[10]]
## Warning: Removed 248 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[11]]
## Warning: Removed 184 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[12]]
## Warning: Removed 184 rows containing missing values or values outside the scale range
## (`geom_point()`).

## 
## [[13]]
## Warning: Removed 160 rows containing missing values or values outside the scale range
## (`geom_point()`).