Chapter 3 Missing data

Basic analysis of missing data

Missing data overview

The analysis of missing data reveals critical gaps on some indicators and years.

Number of countries/economies with collected data:

## [1] 194

Some countries include only basic economic indicators such as GDP per capita. What rule should we take to decide which series will be imputed? What is a good amount of missing values?

  • Countries with at least 3 data points available by series:
## [1] 67
  • Countries with at least 2 data points available by series:
## [1] 107
  • Countries with at least 1 data point available by series:
## [1] 128

Number of series fully missing by country:

##    ISO3 count_NA
## 1   FJI        1
## 2   GNB        1
## 3   JAM        1
## 4   KHM        1
## 5   LBR        1
## 6   MKD        1
## 7   MLT        1
## 8   MWI        1
## 9   NZL        1
## 10  SGP        1
## 11  SWZ        1
## 12  TLS        1
## 13  BHR        2
## 14  BIH        2
## 15  CPV        2
## 16  HTI        2
## 17  KWT        2
## 18  LBN        2
## 19  OMN        2
## 20  PNG        2
## 21  QAT        2
## 22  SAU        2
## 23  SLB        2
## 24  SOM        2
## 25  STP        2
## 26  SYR        2
## 27  VNM        2
## 28  YEM        2
## 29  ZMB        2
## 30  AFG        3
## 31  BHS        3
## 32  BRB        3
## 33  GNQ        3
## 34  GUY        3
## 35  TON        3
## 36  VEN        3
## 37  VUT        3
## 38  WSM        3
## 39  BLZ        4
## 40  BRN        4
## 41  LBY        4
## 42  LCA        4
## 43  SUR        4
## 44  TTO        4
## 45  CUB        5
## 46  PSE        5
## 47  TKM        5
## 48  VCT        5
## 49  HKG        6
## 50  SSD        6
## 51  ERI        8
## 52  MAC       10
## 53  SYC       13
## 54  DMA       14
## 55  KIR       14
## 56  GRD       17
## 57  MHL       17
## 58  PLW       18
## 59  GUM       19
## 60  TUV       19
## 61  ABW       20
## 62  BMU       21
## 63  CYM       21
## 64  SMR       21
## 65  CUW       24
## 66  AND       25

Countries with only one missing data series. Which one is missing?

##    ISO3    Series_missing
## 1   FJI      FB_BNK_ACCSS
## 2   GNB      FB_BNK_ACCSS
## 3   JAM       MAR_AGE_MAL
## 4   KHM       SI.POV.LMIC
## 5   LBR    NE.EXP.GNFS.ZS
## 6   MKD       MAR_AGE_MAL
## 7   MLT NY.ADJ.NNTY.PC.KD
## 8   MWI    NE.EXP.GNFS.ZS
## 9   NZL       SI.POV.LMIC
## 10  SGP       SI.POV.LMIC
## 11  SWZ    LP.LPI.OVRL.XQ
## 12  TLS      FB_BNK_ACCSS

By indicator

Indicators with lowest coverage include:

  • Indicator 2.7: Proportion of adults (15 years and older) with an account at a financial institution or with a mobile money-service company.
  • Indicator 3.8: Average marriage age by sex
  • Indicator 2.1: Logistics performance indicator
  • Indicator 2.6: Universal health coverage - only available for a few years.

By country

Large economies such as China and India were not included in previous editions due to missing data. However, data availability and coverage improved for both countries to be included.

3.0.1 China

China was not included in the SDG Pulse analysis. What is the situation with missing data?

  • PALMA ratio available for 2 data points

3.0.2 India

India was not included in the SDG Pulse analysis. What is the situation with missing data?

  • PALMA ratio available for 3 data points

3.0.3 Other most populated countries (United States, Indonesia, Pakistan, Nigeria)

All included in the SDG Pulse 2022

To see other countries, data availability by country is available on SharePoint.

Comparison between collected and imputed data by country is available on SharePoint.

SDG Pulse 2022

Countries included in the SDG Pulse 2022

## [1] 97

Countries selected based on the rule of “at least 2 data points available” but not included in the SDG Pulse 2022.

##  [1] "Algeria"            "Australia"          "Bhutan"            
##  [4] "Comoros"            "Djibouti"           "El Salvador"       
##  [7] "Iceland"            "Jordan"             "Panama"            
## [10] "Paraguay"           "Russian Federation" "Singapore"         
## [13] "Thailand"           "Turkey"             "Uruguay"

Countries selected based on the rule of “at least 1 data point available” but not included in the SDG Pulse 2022.

## [1] "Singapore"

Proposed decision rule:

  • take countries with “at least 1 data point available”
  • check countries included in the SDG Pulse
  • all problematic countries resolved except of Singapore

Countries to investigate

## [[1]]
## Warning: Removed 252 rows containing
## missing values
## (`geom_point()`).

## 
## [[2]]
## Warning: Removed 279 rows containing
## missing values
## (`geom_point()`).

## 
## [[3]]
## Warning: Removed 256 rows containing
## missing values
## (`geom_point()`).

## 
## [[4]]
## Warning: Removed 277 rows containing
## missing values
## (`geom_point()`).

## 
## [[5]]
## Warning: Removed 193 rows containing
## missing values
## (`geom_point()`).

## 
## [[6]]
## Warning: Removed 300 rows containing
## missing values
## (`geom_point()`).

## 
## [[7]]
## Warning: Removed 202 rows containing
## missing values
## (`geom_point()`).

## 
## [[8]]
## Warning: Removed 246 rows containing
## missing values
## (`geom_point()`).

## 
## [[9]]
## Warning: Removed 225 rows containing
## missing values
## (`geom_point()`).

## 
## [[10]]
## Warning: Removed 205 rows containing
## missing values
## (`geom_point()`).

## 
## [[11]]
## Warning: Removed 206 rows containing
## missing values
## (`geom_point()`).

## 
## [[12]]
## Warning: Removed 278 rows containing
## missing values
## (`geom_point()`).

## 
## [[13]]
## Warning: Removed 211 rows containing
## missing values
## (`geom_point()`).

## 
## [[14]]
## Warning: Removed 200 rows containing
## missing values
## (`geom_point()`).

## 
## [[15]]
## Warning: Removed 198 rows containing
## missing values
## (`geom_point()`).