1 M49 Classification

1.1 Objectives

Chapter section list

Locate the file for the UN M49 geoscheme classification data, get it and save it as raw data (Section 1.2).
Inspect the data thoroughly (Section 1.3).
Clean the data (first step) (Section 1.5).
Display the grouping results (Section 1.5.2).
Clean the intermediate region to get the official results (Section 1.6).
Display the new intermediate region group (Section 1.6.2).
List UN and non-UN member states {Section 1.7}
Summary and conclusion {Section 1.8}

1.2 Download M49 Data

An important — maybe the most authoritative — classification system is developed and maintained by the United Nations. It is expressively developed for statistical purposes by the United Nations Statistics Division UNSD using the M49 methodology.

The result is called Standard country or area codes for statistical use (M49) and can be downloaded manually in different languages and formats (Copy into the clipboard, Excel or CSV) from the United Nations Methodology Overview page.

Resource 1.1 : United Nations Statistics Division M49 Classification

Manual download: The standard country or area codes for statistical use (M49) is available in different languages (English, Chinese, Russian, French, Spanish, Arabic) by clicking one of the buttons “Copy”, “Excel” or “CSV”. On this page is no URL for an programmable download with an R script available, because Javascript triggers the buttons mentioned above.
Automatic download by OMNIKA store: I found with OMNIKA DataStore ¹ an external source to download the classification file via R script².

The OMNIKA URLs for download are:

EXCEL: 2022-09-24__Excel_UNSD_M49.xlsx
CSV: 2022-09-24__CSV_UNSD_M49.csv

R Code 1.1 : Download the United Nations M49 Classification

Run this code chunk manually if the file still needs to be downloaded.

Code

## create folders ###########
pb_create_folder(base::paste0(here::here(), "/data/"))
pb_create_folder(base::paste0(here::here(), "/data/unsd"))

## download m49 file ############
url <- "https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__CSV_UNSD_M49.csv"

downloader::download(
    url = url,
    destfile = base::paste0(here::here(), 
            "/data/unsd/m49_raw.csv")
)


## create R object ###############
m49_raw <- 
  readr::read_delim(
    file = base::paste0(here::here(), 
    "/data/unsd/m49_raw.csv"),
    delim = ";"
  )


## save as .rds file ################
pb_save_data_file(
  "unsd", 
  m49_raw, 
  "m49_raw.rds")

(For this R code chunk is no output available)

1.3 Inspect M49 Data

To get an detailed understanding of the data structures I will provide the following two outputs of the raw-data:

A summary statistics with skimr::skim() followed by inspection of the first data rows with dplyr::glimpse().
Several detailed outputs of the classifications categories (regions) and their elements (countries) in different code chunks (tabs).

To facilitate the second task I have prepared the function pb_class_scheme() and stored in “R./helper.r”.

R Code 1.2 : Inspect raw data of the UNSD M49 geoscheme classification

Code

m49_raw <- base::readRDS("data/unsd/m49_raw.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_raw)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_raw)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	m49_raw
Number of rows	249
Number of columns	15
_______________________
Column type frequency:
character	15
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Global Code	0	1.00	3	3	1
Global Name	0	1.00	5	5	1
Region Code	1	1.00	3	3	5
Region Name	1	1.00	4	8	5
Sub-region Code	1	1.00	3	3	17
Sub-region Name	1	1.00	9	31	17
Intermediate Region Code	141	0.43	3	3	8
Intermediate Region Name	141	0.43	9	15	8
Country or Area	0	1.00	4	52	249
M49 Code	0	1.00	3	3	249
ISO-alpha2 Code	2	0.99	2	2	247
ISO-alpha3 Code	1	1.00	3	3	248
Least Developed Countries (LDC)	203	0.18	1	1	1
Land Locked Developing Countries (LLDC)	217	0.13	1	1	1
Small Island Developing States (SIDS)	196	0.21	1	1	1

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 249
#> Columns: 15
#> $ `Global Code`                             <chr> "001", "001", "001", "001", …
#> $ `Global Name`                             <chr> "World", "World", "World", "…
#> $ `Region Code`                             <chr> "002", "002", "002", "002", …
#> $ `Region Name`                             <chr> "Africa", "Africa", "Africa"…
#> $ `Sub-region Code`                         <chr> "015", "015", "015", "015", …
#> $ `Sub-region Name`                         <chr> "Northern Africa", "Northern…
#> $ `Intermediate Region Code`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Intermediate Region Name`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Country or Area`                         <chr> "Algeria", "Egypt", "Libya",…
#> $ `M49 Code`                                <chr> "012", "818", "434", "504", …
#> $ `ISO-alpha2 Code`                         <chr> "DZ", "EG", "LY", "MA", "SD"…
#> $ `ISO-alpha3 Code`                         <chr> "DZA", "EGY", "LBY", "MAR", …
#> $ `Least Developed Countries (LDC)`         <chr> NA, NA, NA, NA, "x", NA, NA,…
#> $ `Land Locked Developing Countries (LLDC)` <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Small Island Developing States (SIDS)`   <chr> NA, NA, NA, NA, NA, NA, NA, …

1.4 Special cases

1.4.1 Missing values

The file has 15 columns as you can also see online from the Overview page.

The many missing values (NAs) for the categories LDC, LLDC and SIDS are easy explained: These three columns are coded with an ‘x’ if the country of this row belong to this category. Recoding these three columns with 1 and 0 (1 = yes, belongs to this category, 0 = no, does not belong to this category) will reduce most of their missing values.

1.4.2 Antarctica

One missing value in the regional categories (Region, Sub-Region and Intermediate Region) is related to Antarctica which is not seen by the M49 scheme as a separated region. It has therefore no regional codes and names with the exception of the overall comprising global region. But it has M49 as well ISO-alpha codes.

1.4.3 Channel Island Sark

One of the missing values for ISO-alpha2 and ISO-alpha3 is related to Sark, which is “recognized by the United Nations Statistics Division (UNSD) as a separate territory” but was not accepted by ISO now for more than 20 years (McCarthy 2020). 2020 a new 54-page submission for an ISO code (see PDF) was applied to ISO but it seems still under consideration, because currently ³ Sark has no ISO 3166 codes.

For further processing of the M49 dataset, however, it is crucial to have a complete list of ISO Codes because these are the columns where the joining of two dataset will be linked. There exist two possibilities to get a complete list for all UN entries for the ISO codes:

Remove the Sark entry from the dataset.
Add ISO-alpha2 and ISO-alpha3 for Sark.

The second possibility is not so out of the hand, because it is possible that Sark will succeed finally with its application. In that case it is highly likely that it will get “CQ” for ISO-alpha2 and “SCQ” for the ISO-alpha3 Code. (The other possibility “sk” is already by Slovakia and Sercq is the original Norman dialect spelling of the island. See After 20-year battle, Channel island Sark finally earns the right to exist on the internet with its own top-level domain (McCarthy 2020)).

1.4.4 Namibia

The other missing value for ISO-alpha2 codes belongs to Namibia because its abbreviation NA is interpreted by R as a missing value!

1.5 Clean Data (first step)

1.5.1 Procedure

Procedure 1.1 : Clean M49 data of the UNSD geoscheme classification (first step)

Load the original M49 dataset (“m49_raw.rds”)
Remove the global codes and global names because they a redundant: All rows have global code “001” (“World”).
Shorten long names to their abbreviation (“LCD”, “LLCD” and “SIDS”).
Remove row “Antarctica” because it is not seen as separate country.
Add ISO Codes “CQ” and “SCQ” to Country or Area of Sark.
Replace NA in the column ISO-alpha2 Code” of Namibia with the string “NA”.
Recode the columns LDC, LLDC and SIDS with 0 and 1.
Relocate columns ISO-alpha3 CODE and Country or Area to the first two columns because these two columns are always relevant for the later groupings and joining with groupings from other sources.
Sort the data alphabetically by Country or Area.

1.5.2 Result (first step)

1.5.2.1 Data Structure

R Code 1.3 : Clean UNSD M49 geoscheme classification data (first step)

Code

## column renaming vector ########
m49_cols = c(
  LDC = "Least Developed Countries (LDC)", 
  LLDC = "Land Locked Developing Countries (LLDC)", 
  SIDS = "Small Island Developing States (SIDS)"
  )


## clean data ###############################
m49_clean <- base::readRDS("data/unsd/m49_raw.rds") |>      # (1)
  dplyr::select(-(1:2)) |>                                  # (2)
  dplyr::rename(tidyselect::all_of(m49_cols)) |>            # (3) 
  dplyr::filter(`Country or Area` != "Antarctica") |>       # (4)
  dplyr::mutate(
    `ISO-alpha2 Code` =
        base::ifelse(`Country or Area` == "Sark", 
                     "CQ",  `ISO-alpha2 Code`),             # (5a)
    `ISO-alpha3 Code` =
        base::ifelse(`Country or Area` == "Sark", 
                     "SCQ",  `ISO-alpha3 Code`),            # (5b)
    `ISO-alpha2 Code` = 
        base::ifelse(`Country or Area` == "Namibia", 
                     "NA", `ISO-alpha2 Code`)               # (6)
    ) |>
  dplyr::relocate(
      ## any_of() does not understand object names (??)
      tidyselect::any_of(
        c("ISO-alpha3 Code", "Country or Area")), 
            .before = `Region Code`) |>                     # (7)
  # .x = anonymous function; "x" = value in cols of m40_clean
  dplyr::mutate(dplyr::across(
    LDC:SIDS, ~ dplyr::if_else(.x == "x", 1, 999, 0)  # (8)
    )) |> 
  dplyr::arrange(`Country or Area`)                         # (9)

## save new tibble ##########
pb_save_data_file(
  "unsd",
  m49_clean,
  "m49_clean.rds"
)

## display results ##########
m49_clean <- base::readRDS("data/unsd/m49_clean.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_clean)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_clean)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	m49_clean
Number of rows	248
Number of columns	13
_______________________
Column type frequency:
character	10
numeric	3
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
ISO-alpha3 Code	0	1.00	3	3	248
Country or Area	0	1.00	4	52	248
Region Code	0	1.00	3	3	5
Region Name	0	1.00	4	8	5
Sub-region Code	0	1.00	3	3	17
Sub-region Name	0	1.00	9	31	17
Intermediate Region Code	140	0.44	3	3	8
Intermediate Region Name	140	0.44	9	15	8
M49 Code	0	1.00	3	3	248
ISO-alpha2 Code	0	1.00	2	2	248

Variable type: numeric

skim_variable	complete_rate	mean	sd	p100	hist
LDC	1	0.19	0.39	1	▇▁▁▁▂
LLDC	1	0.13	0.34	1	▇▁▁▁▁
SIDS	1	0.21	0.41	1	▇▁▁▁▂

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 248
#> Columns: 13
#> $ `ISO-alpha3 Code`          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "…
#> $ `Country or Area`          <chr> "Afghanistan", "Albania", "Algeria", "Ameri…
#> $ `Region Code`              <chr> "142", "150", "002", "009", "150", "002", "…
#> $ `Region Name`              <chr> "Asia", "Europe", "Africa", "Oceania", "Eur…
#> $ `Sub-region Code`          <chr> "034", "039", "015", "061", "039", "202", "…
#> $ `Sub-region Name`          <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `Intermediate Region Code` <chr> NA, NA, NA, NA, NA, "017", "029", "029", "0…
#> $ `Intermediate Region Name` <chr> NA, NA, NA, NA, NA, "Middle Africa", "Carib…
#> $ `M49 Code`                 <chr> "004", "008", "012", "016", "020", "024", "…
#> $ `ISO-alpha2 Code`          <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "…
#> $ LDC                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ LLDC                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
#> $ SIDS                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…

As explained above, only Sark has no value for ISO-alpha2 Code and ISO-alpha3 Code .

1.5.2.2 Regional Groups

As we can see from R Code 1.3 the M49 classification of the United Nations knows three different regional groups (in addition to the overall region World.)

Code Collection 1.1 : Display different regional groups

R Code 1.4 : Show Regions of the UNSD M49 Geoscheme

Code

(
    m49_region <-  pb_class_scheme(
            df = base::readRDS("data/unsd/m49_clean.rds"),
            sel1 = rlang::quo(`Country or Area`),
            sel2 = rlang::quo(`Region Name`)
        )
)

Show entries

Search:

	Region Name	Country	N
1	Africa	Algeria; Angola; Benin; Botswana; British Indian Ocean Territory; Burkina Faso; Burundi; Cabo Verde; Cameroon; Central African Republic; Chad; Comoros; Congo; Côte d’Ivoire; Democratic Republic of the Congo; Djibouti; Egypt; Equatorial Guinea; Eritrea; Eswatini; Ethiopia; French Southern Territories; Gabon; Gambia; Ghana; Guinea; Guinea-Bissau; Kenya; Lesotho; Liberia; Libya; Madagascar; Malawi; Mali; Mauritania; Mauritius; Mayotte; Morocco; Mozambique; Namibia; Niger; Nigeria; Rwanda; Réunion; Saint Helena; Sao Tome and Principe; Senegal; Seychelles; Sierra Leone; Somalia; South Africa; South Sudan; Sudan; Togo; Tunisia; Uganda; United Republic of Tanzania; Western Sahara; Zambia; Zimbabwe ;	60
2	Americas	Anguilla; Antigua and Barbuda; Argentina; Aruba; Bahamas; Barbados; Belize; Bermuda; Bolivia (Plurinational State of); Bonaire, Sint Eustatius and Saba; Bouvet Island; Brazil; British Virgin Islands; Canada; Cayman Islands; Chile; Colombia; Costa Rica; Cuba; Curaçao; Dominica; Dominican Republic; Ecuador; El Salvador; Falkland Islands (Malvinas); French Guiana; Greenland; Grenada; Guadeloupe; Guatemala; Guyana; Haiti; Honduras; Jamaica; Martinique; Mexico; Montserrat; Nicaragua; Panama; Paraguay; Peru; Puerto Rico; Saint Barthélemy; Saint Kitts and Nevis; Saint Lucia; Saint Martin (French Part); Saint Pierre and Miquelon; Saint Vincent and the Grenadines; Sint Maarten (Dutch part); South Georgia and the South Sandwich Islands; Suriname; Trinidad and Tobago; Turks and Caicos Islands; United States Virgin Islands; United States of America; Uruguay; Venezuela (Bolivarian Republic of) ;	57
3	Asia	Afghanistan; Armenia; Azerbaijan; Bahrain; Bangladesh; Bhutan; Brunei Darussalam; Cambodia; China; China, Hong Kong Special Administrative Region; China, Macao Special Administrative Region; Cyprus; Democratic People's Republic of Korea; Georgia; India; Indonesia; Iran (Islamic Republic of); Iraq; Israel; Japan; Jordan; Kazakhstan; Kuwait; Kyrgyzstan; Lao People's Democratic Republic; Lebanon; Malaysia; Maldives; Mongolia; Myanmar; Nepal; Oman; Pakistan; Philippines; Qatar; Republic of Korea; Saudi Arabia; Singapore; Sri Lanka; State of Palestine; Syrian Arab Republic; Tajikistan; Thailand; Timor-Leste; Turkmenistan; Türkiye; United Arab Emirates; Uzbekistan; Viet Nam; Yemen ;	50
4	Europe	Albania; Andorra; Austria; Belarus; Belgium; Bosnia and Herzegovina; Bulgaria; Croatia; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Germany; Gibraltar; Greece; Guernsey; Holy See; Hungary; Iceland; Ireland; Isle of Man; Italy; Jersey; Latvia; Liechtenstein; Lithuania; Luxembourg; Malta; Monaco; Montenegro; Netherlands; North Macedonia; Norway; Poland; Portugal; Republic of Moldova; Romania; Russian Federation; San Marino; Sark; Serbia; Slovakia; Slovenia; Spain; Svalbard and Jan Mayen Islands; Sweden; Switzerland; Ukraine; United Kingdom of Great Britain and Northern Ireland; Åland Islands ;	52
5	Oceania	American Samoa; Australia; Christmas Island; Cocos (Keeling) Islands; Cook Islands; Fiji; French Polynesia; Guam; Heard Island and McDonald Islands; Kiribati; Marshall Islands; Micronesia (Federated States of); Nauru; New Caledonia; New Zealand; Niue; Norfolk Island; Northern Mariana Islands; Palau; Papua New Guinea; Pitcairn; Samoa; Solomon Islands; Tokelau; Tonga; Tuvalu; United States Minor Outlying Islands; Vanuatu; Wallis and Futuna Islands ;	29

Showing 1 to 5 of 5 entries

Previous1Next

“Region” is a classification scheme with 248 countries in 5 regions.

R Code 1.5 : Show Sub-regions of the UNSD M49 Geoscheme

Code

(
    m49_sub_region <- pb_class_scheme(
            df <-  base::readRDS("data/unsd/m49_clean.rds"),
            sel1 = rlang::quo(`Country or Area`),
            sel2 = rlang::quo(`Sub-region Name`)
        )
)

Show entries

Search:

	Sub-region Name	Country	N
1	Australia and New Zealand	Australia; Christmas Island; Cocos (Keeling) Islands; Heard Island and McDonald Islands; New Zealand; Norfolk Island ;	6
2	Central Asia	Kazakhstan; Kyrgyzstan; Tajikistan; Turkmenistan; Uzbekistan ;	5
3	Eastern Asia	China; China, Hong Kong Special Administrative Region; China, Macao Special Administrative Region; Democratic People's Republic of Korea; Japan; Mongolia; Republic of Korea ;	7
4	Eastern Europe	Belarus; Bulgaria; Czechia; Hungary; Poland; Republic of Moldova; Romania; Russian Federation; Slovakia; Ukraine ;	10
5	Latin America and the Caribbean	Anguilla; Antigua and Barbuda; Argentina; Aruba; Bahamas; Barbados; Belize; Bolivia (Plurinational State of); Bonaire, Sint Eustatius and Saba; Bouvet Island; Brazil; British Virgin Islands; Cayman Islands; Chile; Colombia; Costa Rica; Cuba; Curaçao; Dominica; Dominican Republic; Ecuador; El Salvador; Falkland Islands (Malvinas); French Guiana; Grenada; Guadeloupe; Guatemala; Guyana; Haiti; Honduras; Jamaica; Martinique; Mexico; Montserrat; Nicaragua; Panama; Paraguay; Peru; Puerto Rico; Saint Barthélemy; Saint Kitts and Nevis; Saint Lucia; Saint Martin (French Part); Saint Vincent and the Grenadines; Sint Maarten (Dutch part); South Georgia and the South Sandwich Islands; Suriname; Trinidad and Tobago; Turks and Caicos Islands; United States Virgin Islands; Uruguay; Venezuela (Bolivarian Republic of) ;	52
6	Melanesia	Fiji; New Caledonia; Papua New Guinea; Solomon Islands; Vanuatu ;	5
7	Micronesia	Guam; Kiribati; Marshall Islands; Micronesia (Federated States of); Nauru; Northern Mariana Islands; Palau; United States Minor Outlying Islands ;	8
8	Northern Africa	Algeria; Egypt; Libya; Morocco; Sudan; Tunisia; Western Sahara ;	7
9	Northern America	Bermuda; Canada; Greenland; Saint Pierre and Miquelon; United States of America ;	5
10	Northern Europe	Denmark; Estonia; Faroe Islands; Finland; Guernsey; Iceland; Ireland; Isle of Man; Jersey; Latvia; Lithuania; Norway; Sark; Svalbard and Jan Mayen Islands; Sweden; United Kingdom of Great Britain and Northern Ireland; Åland Islands ;	17
11	Polynesia	American Samoa; Cook Islands; French Polynesia; Niue; Pitcairn; Samoa; Tokelau; Tonga; Tuvalu; Wallis and Futuna Islands ;	10
12	South-eastern Asia	Brunei Darussalam; Cambodia; Indonesia; Lao People's Democratic Republic; Malaysia; Myanmar; Philippines; Singapore; Thailand; Timor-Leste; Viet Nam ;	11
13	Southern Asia	Afghanistan; Bangladesh; Bhutan; India; Iran (Islamic Republic of); Maldives; Nepal; Pakistan; Sri Lanka ;	9
14	Southern Europe	Albania; Andorra; Bosnia and Herzegovina; Croatia; Gibraltar; Greece; Holy See; Italy; Malta; Montenegro; North Macedonia; Portugal; San Marino; Serbia; Slovenia; Spain ;	16
15	Sub-Saharan Africa	Angola; Benin; Botswana; British Indian Ocean Territory; Burkina Faso; Burundi; Cabo Verde; Cameroon; Central African Republic; Chad; Comoros; Congo; Côte d’Ivoire; Democratic Republic of the Congo; Djibouti; Equatorial Guinea; Eritrea; Eswatini; Ethiopia; French Southern Territories; Gabon; Gambia; Ghana; Guinea; Guinea-Bissau; Kenya; Lesotho; Liberia; Madagascar; Malawi; Mali; Mauritania; Mauritius; Mayotte; Mozambique; Namibia; Niger; Nigeria; Rwanda; Réunion; Saint Helena; Sao Tome and Principe; Senegal; Seychelles; Sierra Leone; Somalia; South Africa; South Sudan; Togo; Uganda; United Republic of Tanzania; Zambia; Zimbabwe ;	53
16	Western Asia	Armenia; Azerbaijan; Bahrain; Cyprus; Georgia; Iraq; Israel; Jordan; Kuwait; Lebanon; Oman; Qatar; Saudi Arabia; State of Palestine; Syrian Arab Republic; Türkiye; United Arab Emirates; Yemen ;	18
17	Western Europe	Austria; Belgium; France; Germany; Liechtenstein; Luxembourg; Monaco; Netherlands; Switzerland ;	9

Showing 1 to 17 of 17 entries

Previous1Next

“Sub-region” is a classification scheme with 248 countries in 17 regions.

R Code 1.6 : : Show Intermediate Regions of the UNSD M49 Geoscheme

Code

(
    m49_intermediate_region <- pb_class_scheme(
                df = base::readRDS("data/unsd/m49_clean.rds"),
                sel1 = rlang::quo(`Country or Area`),
                sel2 = rlang::quo(`Intermediate Region Name`)
            )
)

Show entries

Search:

	Intermediate Region Name	Country	N
1	Caribbean	Anguilla; Antigua and Barbuda; Aruba; Bahamas; Barbados; Bonaire, Sint Eustatius and Saba; British Virgin Islands; Cayman Islands; Cuba; Curaçao; Dominica; Dominican Republic; Grenada; Guadeloupe; Haiti; Jamaica; Martinique; Montserrat; Puerto Rico; Saint Barthélemy; Saint Kitts and Nevis; Saint Lucia; Saint Martin (French Part); Saint Vincent and the Grenadines; Sint Maarten (Dutch part); Trinidad and Tobago; Turks and Caicos Islands; United States Virgin Islands ;	28
2	Central America	Belize; Costa Rica; El Salvador; Guatemala; Honduras; Mexico; Nicaragua; Panama ;	8
3	Channel Islands	Guernsey; Jersey; Sark ;	3
4	Eastern Africa	British Indian Ocean Territory; Burundi; Comoros; Djibouti; Eritrea; Ethiopia; French Southern Territories; Kenya; Madagascar; Malawi; Mauritius; Mayotte; Mozambique; Rwanda; Réunion; Seychelles; Somalia; South Sudan; Uganda; United Republic of Tanzania; Zambia; Zimbabwe ;	22
5	Middle Africa	Angola; Cameroon; Central African Republic; Chad; Congo; Democratic Republic of the Congo; Equatorial Guinea; Gabon; Sao Tome and Principe ;	9
6	South America	Argentina; Bolivia (Plurinational State of); Bouvet Island; Brazil; Chile; Colombia; Ecuador; Falkland Islands (Malvinas); French Guiana; Guyana; Paraguay; Peru; South Georgia and the South Sandwich Islands; Suriname; Uruguay; Venezuela (Bolivarian Republic of) ;	16
7	Southern Africa	Botswana; Eswatini; Lesotho; Namibia; South Africa ;	5
8	Western Africa	Benin; Burkina Faso; Cabo Verde; Côte d’Ivoire; Gambia; Ghana; Guinea; Guinea-Bissau; Liberia; Mali; Mauritania; Niger; Nigeria; Saint Helena; Senegal; Sierra Leone; Togo ;	17
9		Afghanistan; Albania; Algeria; American Samoa; Andorra; Armenia; Australia; Austria; Azerbaijan; Bahrain; Bangladesh; Belarus; Belgium; Bermuda; Bhutan; Bosnia and Herzegovina; Brunei Darussalam; Bulgaria; Cambodia; Canada; China; China, Hong Kong Special Administrative Region; China, Macao Special Administrative Region; Christmas Island; Cocos (Keeling) Islands; Cook Islands; Croatia; Cyprus; Czechia; Democratic People's Republic of Korea; Denmark; Egypt; Estonia; Faroe Islands; Fiji; Finland; France; French Polynesia; Georgia; Germany; Gibraltar; Greece; Greenland; Guam; Heard Island and McDonald Islands; Holy See; Hungary; Iceland; India; Indonesia; Iran (Islamic Republic of); Iraq; Ireland; Isle of Man; Israel; Italy; Japan; Jordan; Kazakhstan; Kiribati; Kuwait; Kyrgyzstan; Lao People's Democratic Republic; Latvia; Lebanon; Libya; Liechtenstein; Lithuania; Luxembourg; Malaysia; Maldives; Malta; Marshall Islands; Micronesia (Federated States of); Monaco; Mongolia; Montenegro; Morocco; Myanmar; Nauru; Nepal; Netherlands; New Caledonia; New Zealand; Niue; Norfolk Island; North Macedonia; Northern Mariana Islands; Norway; Oman; Pakistan; Palau; Papua New Guinea; Philippines; Pitcairn; Poland; Portugal; Qatar; Republic of Korea; Republic of Moldova; Romania; Russian Federation; Saint Pierre and Miquelon; Samoa; San Marino; Saudi Arabia; Serbia; Singapore; Slovakia; Slovenia; Solomon Islands; Spain; Sri Lanka; State of Palestine; Sudan; Svalbard and Jan Mayen Islands; Sweden; Switzerland; Syrian Arab Republic; Tajikistan; Thailand; Timor-Leste; Tokelau; Tonga; Tunisia; Turkmenistan; Tuvalu; Türkiye; Ukraine; United Arab Emirates; United Kingdom of Great Britain and Northern Ireland; United States Minor Outlying Islands; United States of America; Uzbekistan; Vanuatu; Viet Nam; Wallis and Futuna Islands; Western Sahara; Yemen; Åland Islands ;	140

Showing 1 to 9 of 9 entries

Previous1Next

“Intermediate Region” is a classification scheme with 248 countries in 9 regions.

1.6 Clean Data (second step)

The intermediate grouping does not result into the expected 22 (with Antarctica: 23) different regions as is mentioned in many documents. See for instance the Article on Wikipedia about the UN geoscheme which features a colored world map and a list of countries grouped into the 22 different regions.

22 geographical sub-regions as defined by the UNSD are shown with different colors. Antarctica is not shown. — Graph 1.1: M49 Geoscheme developed and maintained by the United Nations Statistics Divisions (UNSD). (CC BY-SA 3.0, Wikimedia Commons)

The solution is that we have the NAs in Intermediate Region Name to replace with the values of the sub-regions. Additionally — as can be seen in R Code 1.6 — there is a second small problem: Three small countries are listed as an extra group “Channel Islands”. To get the official intermediate grouping we need to get rid of this group and sort all three of them into the category of “Northern Europe”.

1.6.1 Procedure

Procedure 1.2 : Clean M49 data of the UNSD geoscheme classification (second step)

Replace the NA values of Intermediate Region Name with values from the Sub-region Name column.
Replace the NA values of Intermediate Region Code with values from the Sub-region Code column.
Replace the “Channel Islands” values in Intermediate Region Name with the value of “Northern Europe”.
Replace the “Channel Islands” values (“830”) in Intermediate Region Code with the code of “Northern Europe” (“154”).

1.6.2 Result (second step)

1.6.2.1 Data Structure

R Code 1.7 : Clean UNSD M49 geoscheme classification data (second step)

Code

m49_clean2 <- base::readRDS("data/unsd/m49_clean.rds") |> 
## replace `NA`s of intermediate regions with sub-region values ######
  dplyr::mutate(`Intermediate Region Name` = 
         base::ifelse(is.na(`Intermediate Region Name`), 
                      `Sub-region Name`, `Intermediate Region Name`), # (1)
     `Intermediate Region Code` = 
         base::ifelse(is.na(`Intermediate Region Code`), 
                      `Sub-region Code`, `Intermediate Region Code`), # (2)
## replace ""Channel Islands" with "Northen Europe" values ######
    `Intermediate Region Name` = 
         base::ifelse(`Intermediate Region Name` == "Channel Islands", 
                      "Northern Europe", `Intermediate Region Name`), # (3)
    `Intermediate Region Code` = 
         base::ifelse(`Intermediate Region Code` == "830", 
                      "154", `Intermediate Region Code`)              # (4) 
  )


## save new tibble as clean2 ##########
pb_save_data_file("unsd", m49_clean2, "m49_clean2.rds")


## display results ##########
m49_clean2 <- base::readRDS("data/unsd/m49_clean2.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_clean2)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_clean2)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	m49_clean2
Number of rows	248
Number of columns	13
_______________________
Column type frequency:
character	10
numeric	3
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
ISO-alpha3 Code	1	3	3	248
Country or Area	1	4	52	248
Region Code	1	3	3	5
Region Name	1	4	8	5
Sub-region Code	1	3	3	17
Sub-region Name	1	9	31	17
Intermediate Region Code	1	3	3	22
Intermediate Region Name	1	9	25	22
M49 Code	1	3	3	248
ISO-alpha2 Code	1	2	2	248

Variable type: numeric

skim_variable	complete_rate	mean	sd	p100	hist
LDC	1	0.19	0.39	1	▇▁▁▁▂
LLDC	1	0.13	0.34	1	▇▁▁▁▁
SIDS	1	0.21	0.41	1	▇▁▁▁▂

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 248
#> Columns: 13
#> $ `ISO-alpha3 Code`          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "…
#> $ `Country or Area`          <chr> "Afghanistan", "Albania", "Algeria", "Ameri…
#> $ `Region Code`              <chr> "142", "150", "002", "009", "150", "002", "…
#> $ `Region Name`              <chr> "Asia", "Europe", "Africa", "Oceania", "Eur…
#> $ `Sub-region Code`          <chr> "034", "039", "015", "061", "039", "202", "…
#> $ `Sub-region Name`          <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `Intermediate Region Code` <chr> "034", "039", "015", "061", "039", "017", "…
#> $ `Intermediate Region Name` <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `M49 Code`                 <chr> "004", "008", "012", "016", "020", "024", "…
#> $ `ISO-alpha2 Code`          <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "…
#> $ LDC                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ LLDC                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
#> $ SIDS                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…

With the “trick” to replace the NAs of Intermediate Region Name with the values of Sub-regional Namecolumn we also got rid of the many NAs in that and the accompanying Intermediate Region Code column.

1.6.2.2 Intermediate Region Again

R Code 1.8 : Show correct intermediate region for the UNSD M49 Geoscheme

Code

## show new intermediate result ###############
(
    m49_intermediate2 <- pb_class_scheme(
            df = base::readRDS("data/unsd/m49_clean2.rds"),
            sel1 = rlang::quo(`Country or Area`),
            sel2 = rlang::quo(`Intermediate Region Name`)
            )
)

Show entries

Search:

	Intermediate Region Name	Country	N
1	Australia and New Zealand	Australia; Christmas Island; Cocos (Keeling) Islands; Heard Island and McDonald Islands; New Zealand; Norfolk Island ;	6
2	Caribbean	Anguilla; Antigua and Barbuda; Aruba; Bahamas; Barbados; Bonaire, Sint Eustatius and Saba; British Virgin Islands; Cayman Islands; Cuba; Curaçao; Dominica; Dominican Republic; Grenada; Guadeloupe; Haiti; Jamaica; Martinique; Montserrat; Puerto Rico; Saint Barthélemy; Saint Kitts and Nevis; Saint Lucia; Saint Martin (French Part); Saint Vincent and the Grenadines; Sint Maarten (Dutch part); Trinidad and Tobago; Turks and Caicos Islands; United States Virgin Islands ;	28
3	Central America	Belize; Costa Rica; El Salvador; Guatemala; Honduras; Mexico; Nicaragua; Panama ;	8
4	Central Asia	Kazakhstan; Kyrgyzstan; Tajikistan; Turkmenistan; Uzbekistan ;	5
5	Eastern Africa	British Indian Ocean Territory; Burundi; Comoros; Djibouti; Eritrea; Ethiopia; French Southern Territories; Kenya; Madagascar; Malawi; Mauritius; Mayotte; Mozambique; Rwanda; Réunion; Seychelles; Somalia; South Sudan; Uganda; United Republic of Tanzania; Zambia; Zimbabwe ;	22
6	Eastern Asia	China; China, Hong Kong Special Administrative Region; China, Macao Special Administrative Region; Democratic People's Republic of Korea; Japan; Mongolia; Republic of Korea ;	7
7	Eastern Europe	Belarus; Bulgaria; Czechia; Hungary; Poland; Republic of Moldova; Romania; Russian Federation; Slovakia; Ukraine ;	10
8	Melanesia	Fiji; New Caledonia; Papua New Guinea; Solomon Islands; Vanuatu ;	5
9	Micronesia	Guam; Kiribati; Marshall Islands; Micronesia (Federated States of); Nauru; Northern Mariana Islands; Palau; United States Minor Outlying Islands ;	8
10	Middle Africa	Angola; Cameroon; Central African Republic; Chad; Congo; Democratic Republic of the Congo; Equatorial Guinea; Gabon; Sao Tome and Principe ;	9
11	Northern Africa	Algeria; Egypt; Libya; Morocco; Sudan; Tunisia; Western Sahara ;	7
12	Northern America	Bermuda; Canada; Greenland; Saint Pierre and Miquelon; United States of America ;	5
13	Northern Europe	Denmark; Estonia; Faroe Islands; Finland; Guernsey; Iceland; Ireland; Isle of Man; Jersey; Latvia; Lithuania; Norway; Sark; Svalbard and Jan Mayen Islands; Sweden; United Kingdom of Great Britain and Northern Ireland; Åland Islands ;	17
14	Polynesia	American Samoa; Cook Islands; French Polynesia; Niue; Pitcairn; Samoa; Tokelau; Tonga; Tuvalu; Wallis and Futuna Islands ;	10
15	South America	Argentina; Bolivia (Plurinational State of); Bouvet Island; Brazil; Chile; Colombia; Ecuador; Falkland Islands (Malvinas); French Guiana; Guyana; Paraguay; Peru; South Georgia and the South Sandwich Islands; Suriname; Uruguay; Venezuela (Bolivarian Republic of) ;	16
16	South-eastern Asia	Brunei Darussalam; Cambodia; Indonesia; Lao People's Democratic Republic; Malaysia; Myanmar; Philippines; Singapore; Thailand; Timor-Leste; Viet Nam ;	11
17	Southern Africa	Botswana; Eswatini; Lesotho; Namibia; South Africa ;	5
18	Southern Asia	Afghanistan; Bangladesh; Bhutan; India; Iran (Islamic Republic of); Maldives; Nepal; Pakistan; Sri Lanka ;	9
19	Southern Europe	Albania; Andorra; Bosnia and Herzegovina; Croatia; Gibraltar; Greece; Holy See; Italy; Malta; Montenegro; North Macedonia; Portugal; San Marino; Serbia; Slovenia; Spain ;	16
20	Western Africa	Benin; Burkina Faso; Cabo Verde; Côte d’Ivoire; Gambia; Ghana; Guinea; Guinea-Bissau; Liberia; Mali; Mauritania; Niger; Nigeria; Saint Helena; Senegal; Sierra Leone; Togo ;	17
21	Western Asia	Armenia; Azerbaijan; Bahrain; Cyprus; Georgia; Iraq; Israel; Jordan; Kuwait; Lebanon; Oman; Qatar; Saudi Arabia; State of Palestine; Syrian Arab Republic; Türkiye; United Arab Emirates; Yemen ;	18
22	Western Europe	Austria; Belgium; France; Germany; Liechtenstein; Luxembourg; Monaco; Netherlands; Switzerland ;	9

Showing 1 to 22 of 22 entries

Previous1Next

The new regional group Intermediate Region Name is a classification scheme with 248 countries in 22 regions. As Antarctica is not included the grouping with Intermediate Region Name represents the correct M49 classification.

1.7 Un Countries

The UN M49 geoscheme classification data contains 15 columns and 248 rows (= Columns or Areas). This is much more than the currently 193 member states of the United Nations. Even if we include Holy See (Vatican) and the State of Palestine, which are non-member observer states and the two controversial countries / areas (Taiwan & Kosovo) we are far from the 248 countries or areas listed in the M49 geoscheme of the United Nations.

1.7.1 Procedure

Procedure 1.3 : Add Column for UN country membership

To get only the 193 member states and the two non-member observer states (= 195 UN states) I will apply the following procedure:

Prepared a list with the names of the countries and their ISO-alpha3 codes. (The ISO-alpha3 code is important for the later joining with the cleaned data m49_clean.rds). — Done manually, no program code.
Saved this two row data file manually as “data/unsd/un_countries.csv”. — Done manually, no program code.
Load “data/unsd/un_countries.csv” into memory.
Add a new row UN (for UN member state) to the data, fill all values with “1” (= member state).
Save the result as R object un_countries.rds
Load m49_clean.rds and un_countries.rds.
Join these two data frames as tibbles fully via their ISO code columns.
Replace all NAs of the UN column (these rows did not come from the “un_countries” file) with the value “0” (= non UN member state).
Delete the redundant Country column — originally from the “un_countries” file.
Save the result as new R object m49_clean3.rds.
Inspect the result.

1.7.2 Result

Code Collection 1.2 : Display structure and content of the adapted UN M49 geoscheme dataset

R Code 1.9 : Add column with UN membership

Code

un_countries <- readr::read_csv("data/unsd/un_countries.csv",
                                show_col_types = FALSE) |>       # (3)
  dplyr::mutate(UN = 1)                                        # (4)
  
pb_save_data_file("unsd", un_countries, "un_countries.rds")      # (5)

x <- base::readRDS("data/unsd/m49_clean2.rds")                   # (6a)
y <- base::readRDS("data/unsd/un_countries.rds")                 # (6b)

m49_clean3 <- dplyr::full_join(
  x, y, dplyr::join_by(`ISO-alpha3 Code` == Code)
  ) |>                                                           # (7)
  dplyr::mutate(UN =
        base::ifelse(base::is.na(UN), 0, UN)
        ) |>                                                     # (8)
  dplyr::select(-Country)                                        # (9)

pb_save_data_file("unsd", m49_clean3, "m49_clean3.rds")          # (10)

skimr::skim(m49_clean3)                                          # (11)

Data summary
Name	m49_clean3
Number of rows	248
Number of columns	14
_______________________
Column type frequency:
character	10
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
ISO-alpha3 Code	1	3	3	248
Country or Area	1	4	52	248
Region Code	1	3	3	5
Region Name	1	4	8	5
Sub-region Code	1	3	3	17
Sub-region Name	1	9	31	17
Intermediate Region Code	1	3	3	22
Intermediate Region Name	1	9	25	22
M49 Code	1	3	3	248
ISO-alpha2 Code	1	2	2	248

Variable type: numeric

skim_variable	complete_rate	mean	sd	p25	p50	p75	p100	hist
LDC	1	0.19	0.39	0	0	0	1	▇▁▁▁▂
LLDC	1	0.13	0.34	0	0	0	1	▇▁▁▁▁
SIDS	1	0.21	0.41	0	0	0	1	▇▁▁▁▂
UN	1	0.79	0.41	1	1	1	1	▂▁▁▁▇

R Code 1.10 : List the 193 UN member states (plus Vatican and Palestine as non-member observers)

Code

base::readRDS("data/unsd/m49_clean3.rds") |> 
  dplyr::filter(UN == 1) |> 
  dplyr::select(1,2,4,8) |> 
  DT::datatable()

Show entries

Search:

	ISO-alpha3 Code	Country or Area	Region Name	Intermediate Region Name
1	AFG	Afghanistan	Asia	Southern Asia
2	ALB	Albania	Europe	Southern Europe
3	DZA	Algeria	Africa	Northern Africa
4	AND	Andorra	Europe	Southern Europe
5	AGO	Angola	Africa	Middle Africa
6	ATG	Antigua and Barbuda	Americas	Caribbean
7	ARG	Argentina	Americas	South America
8	ARM	Armenia	Asia	Western Asia
9	AUS	Australia	Oceania	Australia and New Zealand
10	AUT	Austria	Europe	Western Europe

Showing 1 to 10 of 195 entries

Previous1 2 3 4 5…20Next

R Code 1.11 : List the 193 UN member states (plus Vatican and Palestine as non-member observers)

Code

base::readRDS("data/unsd/m49_clean3.rds") |> 
  dplyr::filter(UN == 0) |> 
  dplyr::select(1,2,4,8) |> 
  DT::datatable()

Show entries

Search:

	ISO-alpha3 Code	Country or Area	Region Name	Intermediate Region Name
1	ASM	American Samoa	Oceania	Polynesia
2	AIA	Anguilla	Americas	Caribbean
3	ABW	Aruba	Americas	Caribbean
4	BMU	Bermuda	Americas	Northern America
5	BES	Bonaire, Sint Eustatius and Saba	Americas	Caribbean
6	BVT	Bouvet Island	Americas	South America
7	IOT	British Indian Ocean Territory	Africa	Eastern Africa
8	VGB	British Virgin Islands	Americas	Caribbean
9	CYM	Cayman Islands	Americas	Caribbean
10	HKG	China, Hong Kong Special Administrative Region	Asia	Eastern Asia

Showing 1 to 10 of 53 entries

Previous1 2 3 4 5 6Next

1.8 Summary

The UN M49 geoscheme data contains 249 countries or areas This includes 193 UN members, 2 non-member states with observer status, Antartica and 53 dependent territories. With the exception of Sark all of this regions have ISO-alpha2 and ISO-alpha3 codes ISO-alpha2 Code and ISO-alpha3 Code. These two columns are important, because they facilitate joining data from other sources via this standardized codes. This is crucial because the spelling of the names of countries and areas is not always identical in the different sources. You can’t therefore often not join two data sets just by country or area name but needs a more systematic approach with the ISO codes.

UN M49 geoscheme classification has three regional division with 5, 17 and 22 groups (always without Antarctica and not counting the overall group “World” with code “001”).

About the three grouping we can say (always not including Antarctica):

Region: It consists of five groups representing more or less the continents. But instead of the traditional separation between Northern and Southern America it unites these two continents into the “Americas”, including also the Caribbean countries.
Sub-region: It consists of 17 groups by dividing some of the continents into very big sub-regions. For instance the many African countries are grouped only into two groups: Northern Africa and Sub-Saharan Africa. This is in contrast to the more detailed “Intermediate region” where we have Northern-, Eastern, Western, Southern and Middle Africa. On the other hand the smaller Europe is divided into four sub-regions. The sub-regional division is in my opinion therefore not a very consistent classification.
Intermediate region: It consists of 22 groups and is for statistical purposes the most detailed regional classification.

Additionally there are with LDC, LLDC and SIDS three other divisions, driven not by regional reasons but by geographical common features. As we will see in the other chapters there are — besides of the UN M49 geoscheme — other approaches for a consistent country classification.

References

McCarthy, Kieren. 2020. “After 20-Year Battle, Channel Island Sark Finally Earns the Right to Exist on the Internet with Its Own Top-Level Domain.” https://www.theregister.com/2020/03/23/sark_cctld_iso/.

OMNIKA DataStore is an open-access data science resource for researchers, authors, and technologists. It is 501c3 nonprofit organization whose mission is to digitize, organize, and make important (free) contents available for the general public. The service provides raw data from trusted sources, data visualizations, data analysis tools, and other digital resources.↩︎
A check with base::all.equal() turned out that the files from the two different sources (UNSD and OMNIKA) are identical.↩︎
Even if United Nations and ISO both list 249 entries (including Antarctica) there is a small difference between ISO 3166 codes and the M49 geoscheme of the UN: The UN lists Sark (without ISO codes), whereas the list of ISO includes “Taiwan (Province of China)” with the ISO codes “TW” and “TWN”.↩︎