Chapter 3 Missing data
Basic analysis of missing data
Missing data overview
The analysis of missing data reveals critical gaps on some indicators and years.
Number of countries/economies with collected data:
## [1] 194
Some countries include only basic economic indicators such as GDP per capita. What rule should we take to decide which series will be imputed? What is a good amount of missing values?
- Countries with at least 3 data points available by series:
## [1] 74
- Countries with at least 2 data points available by series:
## [1] 114
- Countries with at least 1 data point available by series:
## [1] 132
Number of series fully missing by country:
## ISO3 count_NA
## 1 FJI 1
## 2 GNB 1
## 3 JAM 1
## 4 KHM 1
## 5 MKD 1
## 6 MLT 1
## 7 NZL 1
## 8 PNG 1
## 9 QAT 1
## 10 SGP 1
## 11 STP 1
## 12 SWZ 1
## 13 SYR 1
## 14 TLS 1
## 15 AFG 2
## 16 BHR 2
## 17 CPV 2
## 18 DJI 2
## 19 GUY 2
## 20 HTI 2
## 21 KWT 2
## 22 OMN 2
## 23 SAU 2
## 24 SLB 2
## 25 SOM 2
## 26 TTO 2
## 27 YEM 2
## 28 ZMB 2
## 29 BHS 3
## 30 BLZ 3
## 31 BRB 3
## 32 GNQ 3
## 33 LCA 3
## 34 SUR 3
## 35 TON 3
## 36 VEN 3
## 37 VUT 3
## 38 WSM 3
## 39 BRN 4
## 40 LBY 4
## 41 SSD 4
## 42 TKM 4
## 43 VCT 4
## 44 CUB 5
## 45 PSE 5
## 46 HKG 6
## 47 ERI 8
## 48 MAC 10
## 49 SYC 13
## 50 DMA 14
## 51 KIR 14
## 52 GRD 15
## 53 MHL 17
## 54 PLW 17
## 55 TUV 18
## 56 ABW 20
## 57 GUM 20
## 58 AND 21
## 59 BMU 21
## 60 CYM 21
## 61 SMR 22
## 62 CUW 24
Countries with only one missing data series. Which one is missing?
## ISO3 Series_missing
## 1 FJI FB_BNK_ACCSS
## 2 GNB FB_BNK_ACCSS
## 3 JAM MAR_AGE_MAL
## 4 KHM SI.POV.LMIC
## 5 MKD MAR_AGE_MAL
## 6 MLT NY.ADJ.NNTY.PC.KD
## 7 NZL SI.POV.LMIC
## 8 PNG FB_BNK_ACCSS
## 9 QAT PALMA
## 10 SGP SI.POV.LMIC
## 11 STP FB_BNK_ACCSS
## 12 SWZ LP.LPI.OVRL.XQ
## 13 SYR EG_EGY_PRIM
## 14 TLS FB_BNK_ACCSS
By indicator
Indicators with lowest coverage include:
- Indicator 2.7: Proportion of adults (15 years and older) with an account at a financial institution or with a mobile money-service company.
- Indicator 3.8: Average marriage age by sex
- Indicator 2.1: Logistics performance indicator
- Indicator 2.6: Universal health coverage - only available for a few years.
By country
Large economies such as China and India were not included in previous editions due to missing data. However, data availability and coverage improved for both countries to be included.
3.0.1 China
China was not included in the SDG Pulse analysis. What is the situation with missing data?
- PALMA ratio available for 2 data points
3.0.2 India
India was not included in the SDG Pulse analysis. What is the situation with missing data?
- PALMA ratio available for 3 data points
3.0.3 Other most populated countries (United States, Indonesia, Pakistan, Nigeria)
All included in the SDG Pulse 2022
To see other countries, data availability by country is available on SharePoint.
Comparison between collected and imputed data by country is available on SharePoint.
SDG Pulse 2022
Countries included in the SDG Pulse 2022
## [1] 97
Countries selected based on the rule of “at least 2 data points available” but not included in the SDG Pulse 2022.
## [1] "Algeria" "Australia" "Myanmar" "El Salvador"
## [5] "Djibouti" "Jordan" "Nigeria" "Paraguay"
## [9] "Russian Federation" "Singapore" "Thailand" "Türkiye"
## [13] "Uruguay"
Countries selected based on the rule of “at least 1 data point available” but not included in the SDG Pulse 2022.
## [1] "Djibouti" "Singapore"
Proposed decision rule:
- take countries with “at least 1 data point available”
- check countries included in the SDG Pulse
- all problematic countries resolved except of Singapore
Countries to investigate
## [[1]]
## Warning: Removed 209 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[2]]
## Warning: Removed 232 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[3]]
## Warning: Removed 246 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[4]]
## Warning: Removed 151 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[5]]
## Warning: Removed 287 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[6]]
## Warning: Removed 204 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[7]]
## Warning: Removed 217 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[8]]
## Warning: Removed 170 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[9]]
## Warning: Removed 184 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[10]]
## Warning: Removed 248 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[11]]
## Warning: Removed 184 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[12]]
## Warning: Removed 184 rows containing missing values or values outside the scale range
## (`geom_point()`).
##
## [[13]]
## Warning: Removed 160 rows containing missing values or values outside the scale range
## (`geom_point()`).