Chapter 3 Missing data
Basic analysis of missing data
Missing data overview
The analysis of missing data reveals critical gaps on some indicators and years.
Number of countries/economies with collected data:
## [1] 194
Some countries include only basic economic indicators such as GDP per capita. What rule should we take to decide which series will be imputed? What is a good amount of missing values?
- Countries with at least 3 data points available by series:
## [1] 67
- Countries with at least 2 data points available by series:
## [1] 107
- Countries with at least 1 data point available by series:
## [1] 128
Number of series fully missing by country:
## ISO3 count_NA
## 1 FJI 1
## 2 GNB 1
## 3 JAM 1
## 4 KHM 1
## 5 LBR 1
## 6 MKD 1
## 7 MLT 1
## 8 MWI 1
## 9 NZL 1
## 10 SGP 1
## 11 SWZ 1
## 12 TLS 1
## 13 BHR 2
## 14 BIH 2
## 15 CPV 2
## 16 HTI 2
## 17 KWT 2
## 18 LBN 2
## 19 OMN 2
## 20 PNG 2
## 21 QAT 2
## 22 SAU 2
## 23 SLB 2
## 24 SOM 2
## 25 STP 2
## 26 SYR 2
## 27 VNM 2
## 28 YEM 2
## 29 ZMB 2
## 30 AFG 3
## 31 BHS 3
## 32 BRB 3
## 33 GNQ 3
## 34 GUY 3
## 35 TON 3
## 36 VEN 3
## 37 VUT 3
## 38 WSM 3
## 39 BLZ 4
## 40 BRN 4
## 41 LBY 4
## 42 LCA 4
## 43 SUR 4
## 44 TTO 4
## 45 CUB 5
## 46 PSE 5
## 47 TKM 5
## 48 VCT 5
## 49 HKG 6
## 50 SSD 6
## 51 ERI 8
## 52 MAC 10
## 53 SYC 13
## 54 DMA 14
## 55 KIR 14
## 56 GRD 17
## 57 MHL 17
## 58 PLW 18
## 59 GUM 19
## 60 TUV 19
## 61 ABW 20
## 62 BMU 21
## 63 CYM 21
## 64 SMR 21
## 65 CUW 24
## 66 AND 25
Countries with only one missing data series. Which one is missing?
## ISO3 Series_missing
## 1 FJI FB_BNK_ACCSS
## 2 GNB FB_BNK_ACCSS
## 3 JAM MAR_AGE_MAL
## 4 KHM SI.POV.LMIC
## 5 LBR NE.EXP.GNFS.ZS
## 6 MKD MAR_AGE_MAL
## 7 MLT NY.ADJ.NNTY.PC.KD
## 8 MWI NE.EXP.GNFS.ZS
## 9 NZL SI.POV.LMIC
## 10 SGP SI.POV.LMIC
## 11 SWZ LP.LPI.OVRL.XQ
## 12 TLS FB_BNK_ACCSS
By indicator
Indicators with lowest coverage include:
- Indicator 2.7: Proportion of adults (15 years and older) with an account at a financial institution or with a mobile money-service company.
- Indicator 3.8: Average marriage age by sex
- Indicator 2.1: Logistics performance indicator
- Indicator 2.6: Universal health coverage - only available for a few years.
By country
Large economies such as China and India were not included in previous editions due to missing data. However, data availability and coverage improved for both countries to be included.
3.0.1 China
China was not included in the SDG Pulse analysis. What is the situation with missing data?
- PALMA ratio available for 2 data points
3.0.2 India
India was not included in the SDG Pulse analysis. What is the situation with missing data?
- PALMA ratio available for 3 data points
3.0.3 Other most populated countries (United States, Indonesia, Pakistan, Nigeria)
All included in the SDG Pulse 2022
To see other countries, data availability by country is available on SharePoint.
Comparison between collected and imputed data by country is available on SharePoint.
SDG Pulse 2022
Countries included in the SDG Pulse 2022
## [1] 97
Countries selected based on the rule of “at least 2 data points available” but not included in the SDG Pulse 2022.
## [1] "Algeria" "Australia" "Bhutan"
## [4] "Comoros" "Djibouti" "El Salvador"
## [7] "Iceland" "Jordan" "Panama"
## [10] "Paraguay" "Russian Federation" "Singapore"
## [13] "Thailand" "Turkey" "Uruguay"
Countries selected based on the rule of “at least 1 data point available” but not included in the SDG Pulse 2022.
## [1] "Singapore"
Proposed decision rule:
- take countries with “at least 1 data point available”
- check countries included in the SDG Pulse
- all problematic countries resolved except of Singapore
Countries to investigate
## [[1]]
## Warning: Removed 252 rows containing
## missing values
## (`geom_point()`).
##
## [[2]]
## Warning: Removed 279 rows containing
## missing values
## (`geom_point()`).
##
## [[3]]
## Warning: Removed 256 rows containing
## missing values
## (`geom_point()`).
##
## [[4]]
## Warning: Removed 277 rows containing
## missing values
## (`geom_point()`).
##
## [[5]]
## Warning: Removed 193 rows containing
## missing values
## (`geom_point()`).
##
## [[6]]
## Warning: Removed 300 rows containing
## missing values
## (`geom_point()`).
##
## [[7]]
## Warning: Removed 202 rows containing
## missing values
## (`geom_point()`).
##
## [[8]]
## Warning: Removed 246 rows containing
## missing values
## (`geom_point()`).
##
## [[9]]
## Warning: Removed 225 rows containing
## missing values
## (`geom_point()`).
##
## [[10]]
## Warning: Removed 205 rows containing
## missing values
## (`geom_point()`).
##
## [[11]]
## Warning: Removed 206 rows containing
## missing values
## (`geom_point()`).
##
## [[12]]
## Warning: Removed 278 rows containing
## missing values
## (`geom_point()`).
##
## [[13]]
## Warning: Removed 211 rows containing
## missing values
## (`geom_point()`).
##
## [[14]]
## Warning: Removed 200 rows containing
## missing values
## (`geom_point()`).
##
## [[15]]
## Warning: Removed 198 rows containing
## missing values
## (`geom_point()`).