1  M49 Classification

1.1 Objectives

Chapter section list

  1. Locate the file for the UN M49 geoscheme classification data, get it and save it as raw data (Section 1.2).
  2. Inspect the data thoroughly (Section 1.3).
  3. Clean the data (first step) (Section 1.5).
  4. Display the grouping results (Section 1.5.2).
  5. Clean the intermediate region to get the official results (Section 1.6).
  6. Display the new intermediate region group (Section 1.6.2).
  7. List UN and non-UN member states {Section 1.7}
  8. Summary and conclusion {Section 1.8}

1.2 Download M49 Data

An important — maybe the most authoritative — classification system is developed and maintained by the United Nations. It is expressively developed for statistical purposes by the United Nations Statistics Division UNSD using the M49 methodology.

The result is called Standard country or area codes for statistical use (M49) and can be downloaded manually in different languages and formats (Copy into the clipboard, Excel or CSV) from the United Nations Methodology Overview page.

Resource 1.1 : United Nations Statistics Division M49 Classification

  • Manual download: The standard country or area codes for statistical use (M49) is available in different languages (English, Chinese, Russian, French, Spanish, Arabic) by clicking one of the buttons “Copy”, “Excel” or “CSV”. On this page is no URL for an programmable download with an R script available, because Javascript triggers the buttons mentioned above.
  • Automatic download by OMNIKA store: I found with OMNIKA DataStore1 an external source to download the classification file via R script2.

The OMNIKA URLs for download are:

R Code 1.1 : Download the United Nations M49 Classification

Run this code chunk manually if the file still needs to be downloaded.
Code
## create folders ###########
pb_create_folder(base::paste0(here::here(), "/data/"))
pb_create_folder(base::paste0(here::here(), "/data/unsd"))

## download m49 file ############
url <- "https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__CSV_UNSD_M49.csv"

downloader::download(
    url = url,
    destfile = base::paste0(here::here(), 
            "/data/unsd/m49_raw.csv")
)


## create R object ###############
m49_raw <- 
  readr::read_delim(
    file = base::paste0(here::here(), 
    "/data/unsd/m49_raw.csv"),
    delim = ";"
  )


## save as .rds file ################
pb_save_data_file(
  "unsd", 
  m49_raw, 
  "m49_raw.rds")
(For this R code chunk is no output available)

1.3 Inspect M49 Data

To get an detailed understanding of the data structures I will provide the following two outputs of the raw-data:

  1. A summary statistics with skimr::skim() followed by inspection of the first data rows with dplyr::glimpse().
  2. Several detailed outputs of the classifications categories (regions) and their elements (countries) in different code chunks (tabs).

To facilitate the second task I have prepared the function pb_class_scheme() and stored in “R./helper.r”.

R Code 1.2 : Inspect raw data of the UNSD M49 geoscheme classification

Code
m49_raw <- base::readRDS("data/unsd/m49_raw.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_raw)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_raw)
#> ******************* Using skimr::skim() ***************************
Data summary
Name m49_raw
Number of rows 249
Number of columns 15
_______________________
Column type frequency:
character 15
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Global Code 0 1.00 3 3 0 1 0
Global Name 0 1.00 5 5 0 1 0
Region Code 1 1.00 3 3 0 5 0
Region Name 1 1.00 4 8 0 5 0
Sub-region Code 1 1.00 3 3 0 17 0
Sub-region Name 1 1.00 9 31 0 17 0
Intermediate Region Code 141 0.43 3 3 0 8 0
Intermediate Region Name 141 0.43 9 15 0 8 0
Country or Area 0 1.00 4 52 0 249 0
M49 Code 0 1.00 3 3 0 249 0
ISO-alpha2 Code 2 0.99 2 2 0 247 0
ISO-alpha3 Code 1 1.00 3 3 0 248 0
Least Developed Countries (LDC) 203 0.18 1 1 0 1 0
Land Locked Developing Countries (LLDC) 217 0.13 1 1 0 1 0
Small Island Developing States (SIDS) 196 0.21 1 1 0 1 0
#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 249
#> Columns: 15
#> $ `Global Code`                             <chr> "001", "001", "001", "001", …
#> $ `Global Name`                             <chr> "World", "World", "World", "…
#> $ `Region Code`                             <chr> "002", "002", "002", "002", …
#> $ `Region Name`                             <chr> "Africa", "Africa", "Africa"…
#> $ `Sub-region Code`                         <chr> "015", "015", "015", "015", …
#> $ `Sub-region Name`                         <chr> "Northern Africa", "Northern…
#> $ `Intermediate Region Code`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Intermediate Region Name`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Country or Area`                         <chr> "Algeria", "Egypt", "Libya",…
#> $ `M49 Code`                                <chr> "012", "818", "434", "504", …
#> $ `ISO-alpha2 Code`                         <chr> "DZ", "EG", "LY", "MA", "SD"…
#> $ `ISO-alpha3 Code`                         <chr> "DZA", "EGY", "LBY", "MAR", …
#> $ `Least Developed Countries (LDC)`         <chr> NA, NA, NA, NA, "x", NA, NA,…
#> $ `Land Locked Developing Countries (LLDC)` <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Small Island Developing States (SIDS)`   <chr> NA, NA, NA, NA, NA, NA, NA, …

1.4 Special cases

1.4.1 Missing values

The file has 15 columns as you can also see online from the Overview page.

The many missing values (NAs) for the categories LDC, LLDC and SIDS are easy explained: These three columns are coded with an ‘x’ if the country of this row belong to this category. Recoding these three columns with 1 and 0 (1 = yes, belongs to this category, 0 = no, does not belong to this category) will reduce most of their missing values.

1.4.2 Antarctica

One missing value in the regional categories (Region, Sub-Region and Intermediate Region) is related to Antarctica which is not seen by the M49 scheme as a separated region. It has therefore no regional codes and names with the exception of the overall comprising global region. But it has M49 as well ISO-alpha codes.

1.4.3 Channel Island Sark

One of the missing values for ISO-alpha2 and ISO-alpha3 is related to Sark, which is “recognized by the United Nations Statistics Division (UNSD) as a separate territory” but was not accepted by ISO now for more than 20 years (McCarthy 2020). 2020 a new 54-page submission for an ISO code (see PDF) was applied to ISO but it seems still under consideration, because currently 3 Sark has no ISO 3166 codes.

For further processing of the M49 dataset, however, it is crucial to have a complete list of ISO Codes because these are the columns where the joining of two dataset will be linked. There exist two possibilities to get a complete list for all UN entries for the ISO codes:

  1. Remove the Sark entry from the dataset.
  2. Add ISO-alpha2 and ISO-alpha3 for Sark.

The second possibility is not so out of the hand, because it is possible that Sark will succeed finally with its application. In that case it is highly likely that it will get “CQ” for ISO-alpha2 and “SCQ” for the ISO-alpha3 Code. (The other possibility “sk” is already by Slovakia and Sercq is the original Norman dialect spelling of the island. See After 20-year battle, Channel island Sark finally earns the right to exist on the internet with its own top-level domain (McCarthy 2020)).

1.4.4 Namibia

The other missing value for ISO-alpha2 codes belongs to Namibia because its abbreviation NA is interpreted by R as a missing value!

1.5 Clean Data (first step)

1.5.1 Procedure

Procedure 1.1 : Clean M49 data of the UNSD geoscheme classification (first step)

  1. Load the original M49 dataset (“m49_raw.rds”)
  2. Remove the global codes and global names because they a redundant: All rows have global code “001” (“World”).
  3. Shorten long names to their abbreviation (“LCD”, “LLCD” and “SIDS”).
  4. Remove row “Antarctica” because it is not seen as separate country.
  5. Add ISO Codes “CQ” and “SCQ” to Country or Area of Sark.
  6. Replace NA in the column ISO-alpha2 Code” of Namibia with the string “NA”.
  7. Recode the columns LDC, LLDC and SIDS with 0 and 1.
  8. Relocate columns ISO-alpha3 CODE and Country or Area to the first two columns because these two columns are always relevant for the later groupings and joining with groupings from other sources.
  9. Sort the data alphabetically by Country or Area.

1.5.2 Result (first step)

1.5.2.1 Data Structure

R Code 1.3 : Clean UNSD M49 geoscheme classification data (first step)

Code
## column renaming vector ########
m49_cols = c(
  LDC = "Least Developed Countries (LDC)", 
  LLDC = "Land Locked Developing Countries (LLDC)", 
  SIDS = "Small Island Developing States (SIDS)"
  )


## clean data ###############################
m49_clean <- base::readRDS("data/unsd/m49_raw.rds") |>      # (1)
  dplyr::select(-(1:2)) |>                                  # (2)
  dplyr::rename(tidyselect::all_of(m49_cols)) |>            # (3) 
  dplyr::filter(`Country or Area` != "Antarctica") |>       # (4)
  dplyr::mutate(
    `ISO-alpha2 Code` =
        base::ifelse(`Country or Area` == "Sark", 
                     "CQ",  `ISO-alpha2 Code`),             # (5a)
    `ISO-alpha3 Code` =
        base::ifelse(`Country or Area` == "Sark", 
                     "SCQ",  `ISO-alpha3 Code`),            # (5b)
    `ISO-alpha2 Code` = 
        base::ifelse(`Country or Area` == "Namibia", 
                     "NA", `ISO-alpha2 Code`)               # (6)
    ) |>
  dplyr::relocate(
      ## any_of() does not understand object names (??)
      tidyselect::any_of(
        c("ISO-alpha3 Code", "Country or Area")), 
            .before = `Region Code`) |>                     # (7)
  # .x = anonymous function; "x" = value in cols of m40_clean
  dplyr::mutate(dplyr::across(
    LDC:SIDS, ~ dplyr::if_else(.x == "x", 1, 999, 0)  # (8)
    )) |> 
  dplyr::arrange(`Country or Area`)                         # (9)

## save new tibble ##########
pb_save_data_file(
  "unsd",
  m49_clean,
  "m49_clean.rds"
)

## display results ##########
m49_clean <- base::readRDS("data/unsd/m49_clean.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_clean)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_clean)
#> ******************* Using skimr::skim() ***************************
Data summary
Name m49_clean
Number of rows 248
Number of columns 13
_______________________
Column type frequency:
character 10
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ISO-alpha3 Code 0 1.00 3 3 0 248 0
Country or Area 0 1.00 4 52 0 248 0
Region Code 0 1.00 3 3 0 5 0
Region Name 0 1.00 4 8 0 5 0
Sub-region Code 0 1.00 3 3 0 17 0
Sub-region Name 0 1.00 9 31 0 17 0
Intermediate Region Code 140 0.44 3 3 0 8 0
Intermediate Region Name 140 0.44 9 15 0 8 0
M49 Code 0 1.00 3 3 0 248 0
ISO-alpha2 Code 0 1.00 2 2 0 248 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
LDC 0 1 0.19 0.39 0 0 0 0 1 ▇▁▁▁▂
LLDC 0 1 0.13 0.34 0 0 0 0 1 ▇▁▁▁▁
SIDS 0 1 0.21 0.41 0 0 0 0 1 ▇▁▁▁▂
#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 248
#> Columns: 13
#> $ `ISO-alpha3 Code`          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "…
#> $ `Country or Area`          <chr> "Afghanistan", "Albania", "Algeria", "Ameri…
#> $ `Region Code`              <chr> "142", "150", "002", "009", "150", "002", "…
#> $ `Region Name`              <chr> "Asia", "Europe", "Africa", "Oceania", "Eur…
#> $ `Sub-region Code`          <chr> "034", "039", "015", "061", "039", "202", "…
#> $ `Sub-region Name`          <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `Intermediate Region Code` <chr> NA, NA, NA, NA, NA, "017", "029", "029", "0…
#> $ `Intermediate Region Name` <chr> NA, NA, NA, NA, NA, "Middle Africa", "Carib…
#> $ `M49 Code`                 <chr> "004", "008", "012", "016", "020", "024", "…
#> $ `ISO-alpha2 Code`          <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "…
#> $ LDC                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ LLDC                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
#> $ SIDS                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…

As explained above, only Sark has no value for ISO-alpha2 Code and ISO-alpha3 Code .

1.5.2.2 Regional Groups

As we can see from R Code 1.3 the M49 classification of the United Nations knows three different regional groups (in addition to the overall region World.)

Code Collection 1.1 : Display different regional groups

R Code 1.4 : Show Regions of the UNSD M49 Geoscheme

Code
(
    m49_region <-  pb_class_scheme(
            df = base::readRDS("data/unsd/m49_clean.rds"),
            sel1 = rlang::quo(`Country or Area`),
            sel2 = rlang::quo(`Region Name`)
        )
)

“Region” is a classification scheme with 248 countries in 5 regions.

R Code 1.5 : Show Sub-regions of the UNSD M49 Geoscheme

Code
(
    m49_sub_region <- pb_class_scheme(
            df <-  base::readRDS("data/unsd/m49_clean.rds"),
            sel1 = rlang::quo(`Country or Area`),
            sel2 = rlang::quo(`Sub-region Name`)
        )
)

“Sub-region” is a classification scheme with 248 countries in 17 regions.

R Code 1.6 : : Show Intermediate Regions of the UNSD M49 Geoscheme

Code
(
    m49_intermediate_region <- pb_class_scheme(
                df = base::readRDS("data/unsd/m49_clean.rds"),
                sel1 = rlang::quo(`Country or Area`),
                sel2 = rlang::quo(`Intermediate Region Name`)
            )
)

“Intermediate Region” is a classification scheme with 248 countries in 9 regions.

1.6 Clean Data (second step)

The intermediate grouping does not result into the expected 22 (with Antarctica: 23) different regions as is mentioned in many documents. See for instance the Article on Wikipedia about the UN geoscheme which features a colored world map and a list of countries grouped into the 22 different regions.

22 geographical sub-regions as defined by the UNSD are shown with different colors. Antarctica is not shown.
Graph 1.1: M49 Geoscheme developed and maintained by the United Nations Statistics Divisions (UNSD). (CC BY-SA 3.0, Wikimedia Commons)

The solution is that we have the NAs in Intermediate Region Name to replace with the values of the sub-regions. Additionally — as can be seen in R Code 1.6 — there is a second small problem: Three small countries are listed as an extra group “Channel Islands”. To get the official intermediate grouping we need to get rid of this group and sort all three of them into the category of “Northern Europe”.

1.6.1 Procedure

Procedure 1.2 : Clean M49 data of the UNSD geoscheme classification (second step)

  1. Replace the NA values of Intermediate Region Name with values from the Sub-region Name column.
  2. Replace the NA values of Intermediate Region Code with values from the Sub-region Code column.
  3. Replace the “Channel Islands” values in Intermediate Region Name with the value of “Northern Europe”.
  4. Replace the “Channel Islands” values (“830”) in Intermediate Region Code with the code of “Northern Europe” (“154”).

1.6.2 Result (second step)

1.6.2.1 Data Structure

R Code 1.7 : Clean UNSD M49 geoscheme classification data (second step)

Code
m49_clean2 <- base::readRDS("data/unsd/m49_clean.rds") |> 
## replace `NA`s of intermediate regions with sub-region values ######
  dplyr::mutate(`Intermediate Region Name` = 
         base::ifelse(is.na(`Intermediate Region Name`), 
                      `Sub-region Name`, `Intermediate Region Name`), # (1)
     `Intermediate Region Code` = 
         base::ifelse(is.na(`Intermediate Region Code`), 
                      `Sub-region Code`, `Intermediate Region Code`), # (2)
## replace ""Channel Islands" with "Northen Europe" values ######
    `Intermediate Region Name` = 
         base::ifelse(`Intermediate Region Name` == "Channel Islands", 
                      "Northern Europe", `Intermediate Region Name`), # (3)
    `Intermediate Region Code` = 
         base::ifelse(`Intermediate Region Code` == "830", 
                      "154", `Intermediate Region Code`)              # (4) 
  )


## save new tibble as clean2 ##########
pb_save_data_file("unsd", m49_clean2, "m49_clean2.rds")


## display results ##########
m49_clean2 <- base::readRDS("data/unsd/m49_clean2.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_clean2)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_clean2)
#> ******************* Using skimr::skim() ***************************
Data summary
Name m49_clean2
Number of rows 248
Number of columns 13
_______________________
Column type frequency:
character 10
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ISO-alpha3 Code 0 1 3 3 0 248 0
Country or Area 0 1 4 52 0 248 0
Region Code 0 1 3 3 0 5 0
Region Name 0 1 4 8 0 5 0
Sub-region Code 0 1 3 3 0 17 0
Sub-region Name 0 1 9 31 0 17 0
Intermediate Region Code 0 1 3 3 0 22 0
Intermediate Region Name 0 1 9 25 0 22 0
M49 Code 0 1 3 3 0 248 0
ISO-alpha2 Code 0 1 2 2 0 248 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
LDC 0 1 0.19 0.39 0 0 0 0 1 ▇▁▁▁▂
LLDC 0 1 0.13 0.34 0 0 0 0 1 ▇▁▁▁▁
SIDS 0 1 0.21 0.41 0 0 0 0 1 ▇▁▁▁▂
#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 248
#> Columns: 13
#> $ `ISO-alpha3 Code`          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "…
#> $ `Country or Area`          <chr> "Afghanistan", "Albania", "Algeria", "Ameri…
#> $ `Region Code`              <chr> "142", "150", "002", "009", "150", "002", "…
#> $ `Region Name`              <chr> "Asia", "Europe", "Africa", "Oceania", "Eur…
#> $ `Sub-region Code`          <chr> "034", "039", "015", "061", "039", "202", "…
#> $ `Sub-region Name`          <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `Intermediate Region Code` <chr> "034", "039", "015", "061", "039", "017", "…
#> $ `Intermediate Region Name` <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `M49 Code`                 <chr> "004", "008", "012", "016", "020", "024", "…
#> $ `ISO-alpha2 Code`          <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "…
#> $ LDC                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ LLDC                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
#> $ SIDS                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…

With the “trick” to replace the NAs of Intermediate Region Name with the values of Sub-regional Namecolumn we also got rid of the many NAs in that and the accompanying Intermediate Region Code column.

1.6.2.2 Intermediate Region Again

R Code 1.8 : Show correct intermediate region for the UNSD M49 Geoscheme

Code
## show new intermediate result ###############
(
    m49_intermediate2 <- pb_class_scheme(
            df = base::readRDS("data/unsd/m49_clean2.rds"),
            sel1 = rlang::quo(`Country or Area`),
            sel2 = rlang::quo(`Intermediate Region Name`)
            )
) 

The new regional group Intermediate Region Name is a classification scheme with 248 countries in 22 regions. As Antarctica is not included the grouping with Intermediate Region Name represents the correct M49 classification.

1.7 Un Countries

The UN M49 geoscheme classification data contains 15 columns and 248 rows (= Columns or Areas). This is much more than the currently 193 member states of the United Nations. Even if we include Holy See (Vatican) and the State of Palestine, which are non-member observer states and the two controversial countries / areas (Taiwan & Kosovo) we are far from the 248 countries or areas listed in the M49 geoscheme of the United Nations.

1.7.1 Procedure

Procedure 1.3 : Add Column for UN country membership

To get only the 193 member states and the two non-member observer states (= 195 UN states) I will apply the following procedure:

  1. Prepared a list with the names of the countries and their ISO-alpha3 codes. (The ISO-alpha3 code is important for the later joining with the cleaned data m49_clean.rds). — Done manually, no program code.
  2. Saved this two row data file manually as “data/unsd/un_countries.csv”. — Done manually, no program code.
  3. Load “data/unsd/un_countries.csv” into memory.
  4. Add a new row UN (for UN member state) to the data, fill all values with “1” (= member state).
  5. Save the result as R object un_countries.rds
  6. Load m49_clean.rds and un_countries.rds.
  7. Join these two data frames as tibbles fully via their ISO code columns.
  8. Replace all NAs of the UN column (these rows did not come from the “un_countries” file) with the value “0” (= non UN member state).
  9. Delete the redundant Country column — originally from the “un_countries” file.
  10. Save the result as new R object m49_clean3.rds.
  11. Inspect the result.

1.7.2 Result

Code Collection 1.2 : Display structure and content of the adapted UN M49 geoscheme dataset

R Code 1.9 : Add column with UN membership

Code
un_countries <- readr::read_csv("data/unsd/un_countries.csv",
                                show_col_types = FALSE) |>       # (3)
  dplyr::mutate(UN = 1)                                        # (4)
  
pb_save_data_file("unsd", un_countries, "un_countries.rds")      # (5)

x <- base::readRDS("data/unsd/m49_clean2.rds")                   # (6a)
y <- base::readRDS("data/unsd/un_countries.rds")                 # (6b)

m49_clean3 <- dplyr::full_join(
  x, y, dplyr::join_by(`ISO-alpha3 Code` == Code)
  ) |>                                                           # (7)
  dplyr::mutate(UN =
        base::ifelse(base::is.na(UN), 0, UN)
        ) |>                                                     # (8)
  dplyr::select(-Country)                                        # (9)

pb_save_data_file("unsd", m49_clean3, "m49_clean3.rds")          # (10)

skimr::skim(m49_clean3)                                          # (11)
Data summary
Name m49_clean3
Number of rows 248
Number of columns 14
_______________________
Column type frequency:
character 10
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ISO-alpha3 Code 0 1 3 3 0 248 0
Country or Area 0 1 4 52 0 248 0
Region Code 0 1 3 3 0 5 0
Region Name 0 1 4 8 0 5 0
Sub-region Code 0 1 3 3 0 17 0
Sub-region Name 0 1 9 31 0 17 0
Intermediate Region Code 0 1 3 3 0 22 0
Intermediate Region Name 0 1 9 25 0 22 0
M49 Code 0 1 3 3 0 248 0
ISO-alpha2 Code 0 1 2 2 0 248 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
LDC 0 1 0.19 0.39 0 0 0 0 1 ▇▁▁▁▂
LLDC 0 1 0.13 0.34 0 0 0 0 1 ▇▁▁▁▁
SIDS 0 1 0.21 0.41 0 0 0 0 1 ▇▁▁▁▂
UN 0 1 0.79 0.41 0 1 1 1 1 ▂▁▁▁▇

R Code 1.10 : List the 193 UN member states (plus Vatican and Palestine as non-member observers)

Code
base::readRDS("data/unsd/m49_clean3.rds") |> 
  dplyr::filter(UN == 1) |> 
  dplyr::select(1,2,4,8) |> 
  DT::datatable()

R Code 1.11 : List the 193 UN member states (plus Vatican and Palestine as non-member observers)

Code
base::readRDS("data/unsd/m49_clean3.rds") |> 
  dplyr::filter(UN == 0) |> 
  dplyr::select(1,2,4,8) |> 
  DT::datatable()

1.8 Summary

The UN M49 geoscheme data contains 249 countries or areas This includes 193 UN members, 2 non-member states with observer status, Antartica and 53 dependent territories. With the exception of Sark all of this regions have ISO-alpha2 and ISO-alpha3 codes ISO-alpha2 Code and ISO-alpha3 Code. These two columns are important, because they facilitate joining data from other sources via this standardized codes. This is crucial because the spelling of the names of countries and areas is not always identical in the different sources. You can’t therefore often not join two data sets just by country or area name but needs a more systematic approach with the ISO codes.

UN M49 geoscheme classification has three regional division with 5, 17 and 22 groups (always without Antarctica and not counting the overall group “World” with code “001”).

About the three grouping we can say (always not including Antarctica):

  • Region: It consists of five groups representing more or less the continents. But instead of the traditional separation between Northern and Southern America it unites these two continents into the “Americas”, including also the Caribbean countries.
  • Sub-region: It consists of 17 groups by dividing some of the continents into very big sub-regions. For instance the many African countries are grouped only into two groups: Northern Africa and Sub-Saharan Africa. This is in contrast to the more detailed “Intermediate region” where we have Northern-, Eastern, Western, Southern and Middle Africa. On the other hand the smaller Europe is divided into four sub-regions. The sub-regional division is in my opinion therefore not a very consistent classification.
  • Intermediate region: It consists of 22 groups and is for statistical purposes the most detailed regional classification.

Additionally there are with LDC, LLDC and SIDS three other divisions, driven not by regional reasons but by geographical common features. As we will see in the other chapters there are — besides of the UN M49 geoscheme — other approaches for a consistent country classification.

References

McCarthy, Kieren. 2020. “After 20-Year Battle, Channel Island Sark Finally Earns the Right to Exist on the Internet with Its Own Top-Level Domain.” https://www.theregister.com/2020/03/23/sark_cctld_iso/.

  1. OMNIKA DataStore is an open-access data science resource for researchers, authors, and technologists. It is 501c3 nonprofit organization whose mission is to digitize, organize, and make important (free) contents available for the general public. The service provides raw data from trusted sources, data visualizations, data analysis tools, and other digital resources.↩︎

  2. A check with base::all.equal() turned out that the files from the two different sources (UNSD and OMNIKA) are identical.↩︎

  3. Even if United Nations and ISO both list 249 entries (including Antarctica) there is a small difference between ISO 3166 codes and the M49 geoscheme of the UN: The UN lists Sark (without ISO codes), whereas the list of ISO includes “Taiwan (Province of China)” with the ISO codes “TW” and “TWN”.↩︎