2 World Bank (WB)

Chapter section list

2.1 Download World Bank Data

The World Bank Classification can be downloaded from How does the World Bank classify countries?. At the bottom of the page you can see the line “Download an Excel file of historical classifications by income.”, providing a link with word “Download”. The downloaded file CLASS.xlsx does not contain a historical classification by income but the general classification system of the last available year (2023).

Yes, there is another Excel file OGHIST.xslx with the historical cutoffs for incomes and lending categories, dating from 1987 to 2023. But the download link for this file is located at another web page: World Bank Country and Lending Groups. On this page you will also find the updates for the cutoffs for countries GNI income per capita which is important for the lending eligibility of countries. World Bank country classifications by income level for 2024-2025 has the current updated values and changes over the last year.

The file CLASS.xlsx am interested here consists of three sheets:

“List of Economoies”
“compositions” and
“Notes”

I am going to download the first two Excel sheets.

R Code 2.1 : Download the World Bank classification file

Run this code chunk manually if the file still needs to be downloaded.

Code

## create folders ###########
pb_create_folder(base::paste0(here::here(), "/data/"))
pb_create_folder(base::paste0(here::here(), "/data/wb"))

## download wb-class file ##############
downloader::download(
    url = "https://datacatalogfiles.worldbank.org/ddh-published/0037712/DR0090755/CLASS.xlsx",
    destfile = base::paste0(here::here(), 
            "/data/wb/wb-class.xlsx")
)

## create R object ############
wb_economies <- 
  readxl::read_xlsx(
    base::paste0(here::here(), 
    "/data/wb/wb-class.xlsx"),
           sheet = "List of economies") 


wb_compositions <- 
  readxl::read_xlsx(
    base::paste0(here::here(), 
    "/data/wb/wb-class.xlsx"),
           sheet = "compositions")

## save as .rds files ###############
pb_save_data_file(
  "wb", 
  wb_economies, 
  "wb_economies.rds"
)


pb_save_data_file(
  "wb", 
  wb_compositions, 
  "wb_compositions.rds"
)

(For this R code chunk is no output available)

2.2 Inspect World Bank Data

To get an detailed understanding of the data structures I will provide the following outputs:

A summary statistics with skimr::skim() followed by inspection of the value for the first rows of each column with dplyr::glimpse().
Several detailed outputs of the classifications categories (regions) and their elements (countries) in different code chunks (tabs).

2.3 Show Data

Code Collection 2.1 : Inspect the structure of the World Bank classification

WB economies
WB compositions

R Code 2.2 : Inspect sheet List of Economies of the World Bank classification file

Code

wb_economies <- base::readRDS("data/wb/wb_economies.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(wb_economies)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(wb_economies)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	wb_economies
Number of rows	267
Number of columns	5
_______________________
Column type frequency:
character	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Economy	1	1.00	4	50	266
Code	1	1.00	3	3	266
Region	49	0.82	10	26	7
Income group	50	0.81	10	19	4
Lending category	122	0.54	3	5	3

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 267
#> Columns: 5
#> $ Economy            <chr> "Afghanistan", "Albania", "Algeria", "American Samo…
#> $ Code               <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "A…
#> $ Region             <chr> "South Asia", "Europe & Central Asia", "Middle East…
#> $ `Income group`     <chr> "Low income", "Upper middle income", "Upper middle …
#> $ `Lending category` <chr> "IDA", "IBRD", "IBRD", NA, NA, "IBRD", "IBRD", "IBR…

R Code 2.3 : Inspect sheet compositions of the World Bank classification file

Code

wb_compositions <- base::readRDS("data/wb/wb_compositions.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(wb_compositions)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(wb_compositions)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	wb_compositions
Number of rows	2084
Number of columns	4
_______________________
Column type frequency:
character	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
WB_Group_Code	1	3	3	48
WB_Group_Name	1	5	50	48
WB_Country_Code	1	3	3	218
WB_Country_Name	1	4	30	218

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 2,084
#> Columns: 4
#> $ WB_Group_Code   <chr> "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE"…
#> $ WB_Group_Name   <chr> "Africa Eastern and Southern", "Africa Eastern and Sou…
#> $ WB_Country_Code <chr> "AGO", "BWA", "BDI", "COM", "COD", "ERI", "SWZ", "ETH"…
#> $ WB_Country_Name <chr> "Angola", "Botswana", "Burundi", "Comoros", "Congo, De…

2.4 Description of WB Data

2.4.1 List of Economies

Caution 2.1

The term country, used interchangeably with economy, does not imply political independence but refers to any territory for which authorities report separate social or economic statistics.

2.4.1.1 Columns

To understand the missing data structure of the “List of Economies” one must know that the data consists of two different tables, separated by an empty line (= one NA for all columns):

The dataset consists of 5 columns:

Economy with a list of 218 economies (= countries or areas in M49 language, see Chapter 1.
Code with the ISO-alpha3 codes for 218 countries (row 2-219) and for 48 additional (regional, economical, and political) areas.
Region with seven different areas:
- East Asia and Pacific,
- Europe and Central Asia,
- Latin America & the Caribbean,
- Middle East and North Africa,
- North America,
- South Asia, and
- Sub-Saharan Africa
Income group with four categories:
- Low income,
- Lower middle income,
- Higher middle income, and
- High income.
Lending category with three categories:
- IBRD,
- Blend, and
- IDA

2.4.1.2 Missing values

We have now a better understanding of the missing values structure:

The variables Economies and Code are complete. The only one missing value results from the separated empty line between the two datasets.
The 49 missing values in the column Region come from the second part of the database, because all the rows from 220-267 (= 48 rows)are empty. Adding the missing value for the whole empty line we got exactly the 49 missing values.
In contrast to Region there is one missing value in the upper (first) part of the data: The reason is that Venezuela RB is lacking an income group because it has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Venezuela, RB was classified as an upper-middle income country until FY21 (Fiscal Year 2021), has been unclassified since then due to the unavailability of data. But it is currently again classified as Upper middle income (See the World Bank page about Venezuela, RB). But this actually renewed classification is not included in the somewhat older dataset from the FY2023.
The many missing values in the Lending category reflects the fact that only some countries are eligible for their development supporting low money lending conditions.

2.4.1.3 Income groups

The cut off limits for the income groups are: (from the “Notes” sheet)

low income, $1,145 or less;
lower middle income, $1,146 to $4,515;
upper middle income, $4,516 to $14,005; and
high income, more than $14,005.

2.4.1.4 Lending categories

The three lending categories and their relation to each other are:

IDA countries are those that lack the financial ability to borrow from IBRD. IDA credits are deeply concessional—interest-free loans and grants for programs aimed at boosting economic growth and improving living conditions. IBRD loans are non-concessional. Blend countries are eligible for IDA credits because of their low per capita incomes but are also eligible for IBRD because they are financially creditworthy.

The effective operational cutoff for IDA eligibility is $1,335 or less.

2.4.1.5 Miscellanous

What follows is a quote about some details of the income classifications for the 2023 file:

Set on 1 July 2022 remain in effect until 1 July 2023. Venezuela has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Argentina, which was temporarily unclassified in July 2016 pending release of revised national accounts statistics, was classified as upper middle income for FY17 as of 29 September 2016 based on alternative conversion factors. Also effective 29 September 2016, Syrian Arab Republic is reclassified from IBRD lending category to IDA-only. On 29 March 2017, new country codes were introduced to align World Bank 3-letter codes with ISO 3-letter codes: Andorra (AND), Dem. Rep. Congo (COD), Isle of Man (IMN), Kosovo (XKX), Romania (ROU), Timor-Leste (TLS), and West Bank and Gaza (PSE). It is to be noted that Venezuela, RB classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data.

2.4.2 Compositions

The interpretation of the compositions Excel sheet is more difficult, not so straight forward as the List of Economies. The compositions sheet consists of four columns with 2084 rows.

WB_Group_Code: A three letter code similar to the ISO-alpha3 code but not registered by ISO but by the World Bank.
WB_Group_Name: The group name assigned to a regional, economical or political entity.
WB_County_Code: The country or area name as specified in “List of Economies”.
WB_Country_Name: The ISO-alpha3 code as specifies in “List of Economies”, generated and maintained by ISO.

The table is easier to understand if we relocate WB_County_Code and WB_Country_Name to the first resp. second column. Then we can say that the compositions sheet displays all affiliations for a country. It is a combination of all the 49 rows of the second part in the “List of Economies” dataset. To learn about these different groups and the categorization criteria behind it is of special interest for my and the main purpose why I am writing this book.

2.4.3 Interim Summary

The only missing data in the columns Economy and Code corresponds to the empty line #220 that separates the country codes from the regional codes. The missing data in the other columns stem from the different structure of the second part (starting with row #221) of the data, which consists only of the two columns ‘Economy’ and ‘Code’.

Essentially this means that we have in the CLASS.xslx file two different data sets: One for economies and the other one to explicate regional, economical and political grouping codes. In the Excel sheet compositions you will find an extended list of all available group names and their three letter codes combined with the country names and their three letter codes. These group names comprise different kinds of regional groups but also names and codes for different combination of country incomes and lending categories.

All these groups may be of interests for analysis of different trends. But the regional (sub)groups of the compositions sheet do not add up to the complete number of countries (218). This is in contrast to the different regional groups of the WID database because all their regional groups (region1 = 5, region2 = 18, region4 = 10, and region5 = 8 groups) includes all countries (in this case: 216).

2.5 Analysis of WB Data

The World Bank file CLASS.xslx classifies all World Bank member countries (189), and all other economies with populations of more than 30,000 (29) in a coarse grid of only seven regions¹. For operational and analytical purposes, these economies are divided among different groups according to the criteria type, such as:

gross national income (GNI) per capita in 2023, calculated using the World Bank Atlas method.
lending conditions,
demographic dividend, etc.

2.5.1 Separate list of countries from “List of Economies”

My first action for further analysis of the data is to isolate the first data part of the “List of Economies” and then to use the predefined group structure in this part.

Code Collection 2.2 : Separate country data from the Excel sheet List of Economies

WB economies separated
WB standard

R Code 2.4 : List the full data of the World Bank economies

Code

wb_clean <-  base::readRDS("data/wb/wb_economies.rds") |> 
    dplyr::slice(1:218) |> 
    dplyr::mutate(`Income group` =
           base::ifelse(base::is.na(`Income group`), "Upper middle income", `Income group`)
           )

pb_save_data_file("wb", wb_clean, "wb_clean.rds")

glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(wb_clean)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(wb_clean)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	wb_clean
Number of rows	218
Number of columns	5
_______________________
Column type frequency:
character	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Economy	0	1.00	4	30	218
Code	0	1.00	3	3	218
Region	0	1.00	10	26	7
Income group	0	1.00	10	19	4
Lending category	73	0.67	3	5	3

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 218
#> Columns: 5
#> $ Economy            <chr> "Afghanistan", "Albania", "Algeria", "American Samo…
#> $ Code               <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "A…
#> $ Region             <chr> "South Asia", "Europe & Central Asia", "Middle East…
#> $ `Income group`     <chr> "Low income", "Upper middle income", "Upper middle …
#> $ `Lending category` <chr> "IDA", "IBRD", "IBRD", NA, NA, "IBRD", "IBRD", "IBR…

R Code 2.5 : Show pre-defined grouping of Word Bank data

Code

(
    wb_standard <- pb_class_scheme(
            df = base::readRDS("data/wb/wb_clean.rds"),
            sel1 = rlang::quo(`Economy`),
            sel2 = rlang::quo(`Region`)
            )
)

Show entries

Search:

	Region	Country	N
1	East Asia & Pacific	American Samoa; Australia; Brunei Darussalam; Cambodia; China; Fiji; French Polynesia; Guam; Hong Kong SAR, China; Indonesia; Japan; Kiribati; Korea, Dem. People's Rep.; Korea, Rep.; Lao PDR; Macao SAR, China; Malaysia; Marshall Islands; Micronesia, Fed. Sts.; Mongolia; Myanmar; Nauru; New Caledonia; New Zealand; Northern Mariana Islands; Palau; Papua New Guinea; Philippines; Samoa; Singapore; Solomon Islands; Taiwan, China; Thailand; Timor-Leste; Tonga; Tuvalu; Vanuatu; Vietnam ;	38
2	Europe & Central Asia	Albania; Andorra; Armenia; Austria; Azerbaijan; Belarus; Belgium; Bosnia and Herzegovina; Bulgaria; Channel Islands; Croatia; Cyprus; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Georgia; Germany; Gibraltar; Greece; Greenland; Hungary; Iceland; Ireland; Isle of Man; Italy; Kazakhstan; Kosovo; Kyrgyz Republic; Latvia; Liechtenstein; Lithuania; Luxembourg; Moldova; Monaco; Montenegro; Netherlands; North Macedonia; Norway; Poland; Portugal; Romania; Russian Federation; San Marino; Serbia; Slovak Republic; Slovenia; Spain; Sweden; Switzerland; Tajikistan; Türkiye; Turkmenistan; Ukraine; United Kingdom; Uzbekistan ;	58
3	Latin America & Caribbean	Antigua and Barbuda; Argentina; Aruba; Bahamas, The; Barbados; Belize; Bolivia; Brazil; British Virgin Islands; Cayman Islands; Chile; Colombia; Costa Rica; Cuba; Curaçao; Dominica; Dominican Republic; Ecuador; El Salvador; Grenada; Guatemala; Guyana; Haiti; Honduras; Jamaica; Mexico; Nicaragua; Panama; Paraguay; Peru; Puerto Rico; Sint Maarten (Dutch part); St. Kitts and Nevis; St. Lucia; St. Martin (French part); St. Vincent and the Grenadines; Suriname; Trinidad and Tobago; Turks and Caicos Islands; Uruguay; Venezuela, RB; Virgin Islands (U.S.) ;	42
4	Middle East & North Africa	Algeria; Bahrain; Djibouti; Egypt, Arab Rep.; Iran, Islamic Rep.; Iraq; Israel; Jordan; Kuwait; Lebanon; Libya; Malta; Morocco; Oman; Qatar; Saudi Arabia; Syrian Arab Republic; Tunisia; United Arab Emirates; West Bank and Gaza; Yemen, Rep. ;	21
5	North America	Bermuda; Canada; United States ;	3
6	South Asia	Afghanistan; Bangladesh; Bhutan; India; Maldives; Nepal; Pakistan; Sri Lanka ;	8
7	Sub-Saharan Africa	Angola; Benin; Botswana; Burkina Faso; Burundi; Cabo Verde; Cameroon; Central African Republic; Chad; Comoros; Congo, Dem. Rep.; Congo, Rep.; Côte d’Ivoire; Equatorial Guinea; Eritrea; Eswatini; Ethiopia; Gabon; Gambia, The; Ghana; Guinea; Guinea-Bissau; Kenya; Lesotho; Liberia; Madagascar; Malawi; Mali; Mauritania; Mauritius; Mozambique; Namibia; Niger; Nigeria; Rwanda; São Tomé and Príncipe; Senegal; Seychelles; Sierra Leone; Somalia; South Africa; South Sudan; Sudan; Tanzania; Togo; Uganda; Zambia; Zimbabwe ;	48

Showing 1 to 7 of 7 entries

Previous1Next

Region is a coarse classification scheme with only 7 regions formed by 218 countries.

The “standard” World Bank grouping system comprises 7 groups and is therefore different to the m49 group divisions of 5 regions, 17 sub-regions and 23 intermediate regions. In contrast to the smallest UN geoscheme of 5 (resp. 6 continents with Antarctica) the WB smallest classification includes transcontinetal groupings such as Europe and Central Asia, Middle East and North Africa (MENA), East Asia and Pacific).

2.5.2 Differences between M49 and World Bank classification

The separation shows 218 “Economies”, i.e. 30 rows less than the 248 countries or areas of the United Nations geoscheme (Section 1.5.2). It is vitale to understand the differences.

Procedure 2.1 : Differences between United Nations and World Bank country classifications

To understand the differences between the UN M49 and the World Bank classification I will aplly the following procedure

Prepare datasets for the comparison (Tab: Prepare data)

Load the data with more rows (UN m49: m49_clean3.rds) into the memory and call this tibble object df_m49.
Select the columns ISO-alpha3 Code, Country or Area, and UN member state for further processing.
Rename the first two columns as Code and UN Country
Relocate Code as first column
Load the World Bank data wb_clean.rds into the memory and call this tibble object df_wb.
Select the columns Economy and Code.
Rename column Economy to WB Economy.

Compare the two datasets

Join both datasets by Code and keep all observations of df_m49 without a match in df_wb (Tab: M49 vs. DB).
Join both datasets by Code and keep all observations of df_wb without a match in df_m49 (Tab: DB vs. M49)

Code Collection 2.3 : Differences of UN M49 and World Bank classification

R Code 2.6 : Prepare the datasets of UN M49 geoscheme with World Bank classification

Code

df_un <- base::readRDS("data/unsd/m49_clean3.rds") |>     # (1)
    dplyr::select(c(1,2, UN)) |>                          # (2)
    dplyr::rename(
        Code = `ISO-alpha3 Code`,
        `UN Country` = `Country or Area`
        ) |>                                              # (3)
    dplyr::relocate(Code, .before = `UN Country`)         # (4)

df_wb <- base::readRDS("data/wb/wb_clean.rds") |>         # (5)
    dplyr::select(1,2) |>                                 # (6)
    dplyr::rename(`WB Economy` = Economy)                 # (7)

(For this R code chunk is no output available)

R Code 2.7 : Show M49 rows that are not included in the World Bank classification data

Code

(
    m49_wb_diff <- dplyr::anti_join(                    # (8)
        df_un, df_wb, dplyr::join_by(Code == Code)
        ) |> 
        DT::datatable(option = list( pageLength = 50))
)

Show entries

Search:

	Code	UN Country	UN
1	AIA	Anguilla	0
2	BES	Bonaire, Sint Eustatius and Saba	0
3	BVT	Bouvet Island	0
4	IOT	British Indian Ocean Territory	0
5	CXR	Christmas Island	0
6	CCK	Cocos (Keeling) Islands	0
7	COK	Cook Islands	0
8	FLK	Falkland Islands (Malvinas)	0
9	GUF	French Guiana	0
10	ATF	French Southern Territories	0
11	GLP	Guadeloupe	0
12	GGY	Guernsey	0
13	HMD	Heard Island and McDonald Islands	0
14	VAT	Holy See	1
15	JEY	Jersey	0
16	MTQ	Martinique	0
17	MYT	Mayotte	0
18	MSR	Montserrat	0
19	NIU	Niue	0
20	NFK	Norfolk Island	0
21	PCN	Pitcairn	0
22	REU	Réunion	0
23	BLM	Saint Barthélemy	0
24	SHN	Saint Helena	0
25	SPM	Saint Pierre and Miquelon	0
26	SCQ	Sark	0
27	SGS	South Georgia and the South Sandwich Islands	0
28	SJM	Svalbard and Jan Mayen Islands	0
29	TKL	Tokelau	0
30	UMI	United States Minor Outlying Islands	0
31	WLF	Wallis and Futuna Islands	0
32	ESH	Western Sahara	0
33	ALA	Åland Islands	0

Showing 1 to 33 of 33 entries

Previous1Next

R Code 2.8 : Show World Bank economy rows that are not included in the UN M49 classification data

Code

(
    wb_m49_diff <- dplyr::anti_join(
        df_wb, df_un, dplyr::join_by(Code == Code)      # (9)
        ) 
)

#> # A tibble: 3 × 2
#>   `WB Economy`    Code 
#>   <chr>           <chr>
#> 1 Channel Islands CHI  
#> 2 Kosovo          XKX  
#> 3 Taiwan, China   TWN

2.5.3 Interim Summary

The M49 geoscheme of the United Nations is the more comprehensive classification system. It includes 33 areas more than the WB classification. But the difference between the two system is 248 - 218 = 30 and not 33 countries, because the World Bank classification contains three geographical areas that are not part of the m49 system:

Channel Islands
Taiwan and
Kosovo

Channel Islands

In the World Bank classification system is “Channel Islands” sorted under Economy (= equivalent to Region or Area in the M49 taxonomy). But in the original UN notation “Channel Island” is an “Intermediate Region Name”.

The biggest three of the Channels Island (Jersey, Guernsey and Sark) are already areas included in the UN M49 scheme, although Sark has not (yet?) an official ISO code. I will therefore remove the Economyrow for “Channel Islands” from the World Bank data because it is also included in the M49 listing.

Taiwan (TN, TWN)

The status of Taiwan is a disputed one:

China considers Taiwan a part of its territory and rejects any notion of Taiwanese independence. As one of the world’s largest and most influential countries, China wields significant political and economic power, which can influence other nations’ decisions regarding Taiwan’s status. Many countries have chosen to adhere to China’s “One China” policy, which recognizes Taiwan as a part of China rather than an independent country. (WorldAtlas)

But for analytical reasons it could be interesting to analyze the well-advanced economy of Taiwan separately. I will therefore include Taiwan in my working country and area classification system.

Kosovo (XK, XKX)

Recognizing Kosovo as an independent country has been a contentious issue in international politics for many years. While the United States has recognized Kosovo as a sovereign state since it declared independence from Serbia in 2008, other countries and international bodies do not. (WorldAtlas)

Here applies the same reason for inclusion of Taiwan in my my classification system: It may be interesting to analyse this region/country separately from other countries of this region.

2.6 Combining M49 and WB

As an interim conclusion (without analysis the second Excel sheet compositions of the World Bank classification system) I will combine the World Bank data (excluding Channel Islands) data with the UN M49 geoscheme.

Procedure 2.2 : Combine M49 with the World Bank classification

Combine M49 and World Bank data and display “Structure” and “Data Values”

Create directory “data/meta” for data combined from different chapters.
Load the cleaned version of the World Bank classification (“data/wb/wb_clean.rds”)
Remove the Economy row with the Value of “Channel Islands”.
Rename the column Region to WB Region to remind where this regional groupings comes from.
Replace the NA’s values of Lending categoy with “NLC” (= No Lending Category) to distinguish it from those rows of Lending category that have NAs because of a missing Country or Area (resp. Economy).
Saves this new cleaned World Bank dataset as “data/wb/wb_clean2.rds”.
Load the latest cleaned version of the m49 data (“data/unsd/m49_clean3.rds”) into memory
Join the UN m49 with the World Bank dataset
Remove the now redundant Economy column.
Save the new dataset as “data/meta/m49_ext.rds” (“ext” stands for extended)
Display the structure of the new dataset m49_ext.
Display the dataset with DT::datatable() for explorations and further analysis. (Tab: “Data Values”)

Code Collection 2.4 : Combine and show the structure and data values of the integrated dataset

Structure
Data Values

R Code 2.9 : Show structure of the combine M49 and World Bank classification

Code

pb_create_folder("data/meta")

wb_clean2 <- base::readRDS("data/wb/wb_clean.rds") |> 
    dplyr::filter(Economy != "Channel Islands") |> 
    dplyr::rename(`WB Region` = Region) |> 
    dplyr::mutate(`Lending category` =
        base::ifelse(base::is.na(`Lending category`), 
                     "NLC", `Lending category`)
        )

pb_save_data_file("wb", wb_clean2, "wb_clean2.rds")

m49_clean3 <- base::readRDS("data/unsd/m49_clean3.rds")

m49_ext <- dplyr::full_join(
    m49_clean3, wb_clean2, dplyr::join_by(`ISO-alpha3 Code` == Code)
) |> 
    dplyr::select(-Economy) |> 
    dplyr::mutate(
      `Country or Area` =
          base::ifelse(`ISO-alpha3 Code` == "XKX", 
                     "Kosovo", `Country or Area`),
      `ISO-alpha2 Code`  =
          base::ifelse(`ISO-alpha3 Code` == "XKX", 
                       "XK", `ISO-alpha2 Code`)
        
        ) |> 
    dplyr::mutate(
      `Country or Area` =
          base::ifelse(`ISO-alpha3 Code` == "TWN", 
                     "Taiwan, Province of China", `Country or Area`),
      `ISO-alpha2 Code`  =  
          base::ifelse(`ISO-alpha3 Code` == "TWN", 
                     "TN", `ISO-alpha2 Code`),
        ) |> 
    dplyr::arrange(`Country or Area`)

pb_save_data_file("meta", m49_ext, "m49_ext.rds")

glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_ext)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_ext)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	m49_ext
Number of rows	250
Number of columns	17
_______________________
Column type frequency:
character	13
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
ISO-alpha3 Code	0	1.00	3	3	250
Country or Area	0	1.00	4	52	250
Region Code	2	0.99	3	3	5
Region Name	2	0.99	4	8	5
Sub-region Code	2	0.99	3	3	17
Sub-region Name	2	0.99	9	31	17
Intermediate Region Code	2	0.99	3	3	22
Intermediate Region Name	2	0.99	9	25	22
M49 Code	2	0.99	3	3	248
ISO-alpha2 Code	0	1.00	2	2	249
WB Region	33	0.87	10	26	7
Income group	33	0.87	10	19	4
Lending category	33	0.87	3	5	4

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p25	p50	p75	p100	hist
LDC	2	0.99	0.19	0.39	0	0	0	1	▇▁▁▁▂
LLDC	2	0.99	0.13	0.34	0	0	0	1	▇▁▁▁▁
SIDS	2	0.99	0.21	0.41	0	0	0	1	▇▁▁▁▂
UN	2	0.99	0.79	0.41	1	1	1	1	▂▁▁▁▇

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 250
#> Columns: 17
#> $ `ISO-alpha3 Code`          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "…
#> $ `Country or Area`          <chr> "Afghanistan", "Albania", "Algeria", "Ameri…
#> $ `Region Code`              <chr> "142", "150", "002", "009", "150", "002", "…
#> $ `Region Name`              <chr> "Asia", "Europe", "Africa", "Oceania", "Eur…
#> $ `Sub-region Code`          <chr> "034", "039", "015", "061", "039", "202", "…
#> $ `Sub-region Name`          <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `Intermediate Region Code` <chr> "034", "039", "015", "061", "039", "017", "…
#> $ `Intermediate Region Name` <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `M49 Code`                 <chr> "004", "008", "012", "016", "020", "024", "…
#> $ `ISO-alpha2 Code`          <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "…
#> $ LDC                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ LLDC                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
#> $ SIDS                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…
#> $ UN                         <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1…
#> $ `WB Region`                <chr> "South Asia", "Europe & Central Asia", "Mid…
#> $ `Income group`             <chr> "Low income", "Upper middle income", "Upper…
#> $ `Lending category`         <chr> "IDA", "IBRD", "IBRD", "NLC", "NLC", "IBRD"…

R Code 2.10 : Show data values of the combined M49 and World Bank classification

Code

(
    m49_ext <- base::readRDS("data/meta/m49_ext.rds") |> 
        DT::datatable()
)

Show entries

Search:

	ISO-alpha3 Code	Country or Area	Region Code	Region Name	Sub-region Code	Sub-region Name	Intermediate Region Code	Intermediate Region Name	M49 Code	ISO-alpha2 Code	LDC	LLDC	SIDS	UN	WB Region	Income group	Lending category
1	AFG	Afghanistan	142	Asia	034	Southern Asia	034	Southern Asia	004	AF	1	1	0	1	South Asia	Low income	IDA
2	ALB	Albania	150	Europe	039	Southern Europe	039	Southern Europe	008	AL	0	0	0	1	Europe & Central Asia	Upper middle income	IBRD
3	DZA	Algeria	002	Africa	015	Northern Africa	015	Northern Africa	012	DZ	0	0	0	1	Middle East & North Africa	Upper middle income	IBRD
4	ASM	American Samoa	009	Oceania	061	Polynesia	061	Polynesia	016	AS	0	0	1	0	East Asia & Pacific	High income	NLC
5	AND	Andorra	150	Europe	039	Southern Europe	039	Southern Europe	020	AD	0	0	0	1	Europe & Central Asia	High income	NLC
6	AGO	Angola	002	Africa	202	Sub-Saharan Africa	017	Middle Africa	024	AO	1	0	0	1	Sub-Saharan Africa	Lower middle income	IBRD
7	AIA	Anguilla	019	Americas	419	Latin America and the Caribbean	029	Caribbean	660	AI	0	0	1	0
8	ATG	Antigua and Barbuda	019	Americas	419	Latin America and the Caribbean	029	Caribbean	028	AG	0	0	1	1	Latin America & Caribbean	High income	IBRD
9	ARG	Argentina	019	Americas	419	Latin America and the Caribbean	005	South America	032	AR	0	0	0	1	Latin America & Caribbean	Upper middle income	IBRD
10	ARM	Armenia	142	Asia	145	Western Asia	145	Western Asia	051	AM	0	1	0	1	Europe & Central Asia	Upper middle income	IBRD

Showing 1 to 10 of 250 entries

Previous1 2 3 4 5…25Next

2.7 Analysing `compositions` data

2.7.1 Multiple Group Membership

Looking again at the compositions dataset (Tab: WB compositions” of Code Collection 2.1) we have to realize that the many grouping criteria are overlapping and therefore multiplicating the relations to specific countries. Every country belongs at least to three different groups. The leader in this respect is the Union of the Comoros, an archipelagic country made up of three islands in Southeastern Africa, located at the northern end of the Mozambique Channel in the Indian Ocean. Comoros is according to the World Bank classification member of 18 different groups.

R Code 2.11 : Multiple group membership according to the different WB criteria

Code

wb_compositions <- base::readRDS("data/wb/wb_compositions.rds") 

glue::glue("********* List countries with the most gropu memberships **********")
wb_compositions |> 
    dplyr::group_by(WB_Country_Name) |> 
    dplyr::summarize(n = dplyr::n()) |> 
    dplyr::arrange(dplyr::desc(n)) |> 
    utils::head()


glue::glue("")
glue::glue("************** Display group memebership of Comoros **************")
wb_compositions |> 
    dplyr::filter(WB_Country_Name == "Comoros") |> 
    print(n = 20)

#> ********* List countries with the most gropu memberships **********
#> # A tibble: 6 × 2
#>   WB_Country_Name           n
#>   <chr>                 <int>
#> 1 Comoros                  18
#> 2 São Tomé and Príncipe    16
#> 3 Djibouti                 15
#> 4 Kiribati                 15
#> 5 Mauritania               15
#> 6 Solomon Islands          15
#> 
#> ************** Display group memebership of Comoros **************
#> # A tibble: 18 × 4
#>    WB_Group_Code WB_Group_Name                   WB_Country_Code WB_Country_Name
#>    <chr>         <chr>                           <chr>           <chr>          
#>  1 AFE           Africa Eastern and Southern     COM             Comoros        
#>  2 ARB           Arab World                      COM             Comoros        
#>  3 FCS           Fragile and conflict affected … COM             Comoros        
#>  4 HPC           Heavily indebted poor countrie… COM             Comoros        
#>  5 IBT           IDA & IBRD total                COM             Comoros        
#>  6 IDX           IDA only                        COM             Comoros        
#>  7 IDA           IDA total                       COM             Comoros        
#>  8 LDC           Least developed countries: UN … COM             Comoros        
#>  9 LMY           Low & middle income             COM             Comoros        
#> 10 LMC           Lower middle income             COM             Comoros        
#> 11 MIC           Middle income                   COM             Comoros        
#> 12 OSS           Other small states              COM             Comoros        
#> 13 PRE           Pre-demographic dividend        COM             Comoros        
#> 14 SST           Small states (SST)              COM             Comoros        
#> 15 SSF           Sub-Saharan Africa              COM             Comoros        
#> 16 SSA           Sub-Saharan Africa (excluding … COM             Comoros        
#> 17 TSS           Sub-Saharan Africa (IDA & IBRD) COM             Comoros        
#> 18 WLD           World                           COM             Comoros

2.7.2 More African Groupings

A second observation is that the countries classified in “Sub-Saharan Africa” are also members of “African Western and Central” or “Africa Eastern & Southern”. We have therefore a second regional group in addition to the already displayed “WB Standard” group (Tab: “WB Standard” in Code Collection 2.2). In addition to the seven groups of “WB Standard” we have another classification with eight different regional areas.

R Code 2.12 : Another regional grouping of the WB classification: WB Standard2

Code

wb_compositions <- base::readRDS("data/wb/wb_compositions.rds") 

wb_grouping2 <- wb_compositions |> 
  dplyr::filter(
      WB_Group_Code == "AFE" |
      WB_Group_Code == "AFW" |
      WB_Group_Code == "EAS" |
      WB_Group_Code == "ECS" |
      WB_Group_Code == "LCN" |
      WB_Group_Code == "MEA" |
      WB_Group_Code == "NAC" |
      WB_Group_Code == "SAS" 
  )

pb_save_data_file("wb", wb_grouping2, "wb_grouping2.rds")

(
    wb_standard2 <- pb_class_scheme(
            df = wb_grouping2,
            sel1 = rlang::quo(WB_Country_Name),
            sel2 = rlang::quo(WB_Group_Name)
            )
)

Show entries

Search:

	WB_Group_Name	Country	N
1	Africa Eastern and Southern	Angola; Botswana; Burundi; Comoros; Congo, Dem. Rep.; Eritrea; Eswatini; Ethiopia; Kenya; Lesotho; Madagascar; Malawi; Mauritius; Mozambique; Namibia; Rwanda; São Tomé and Príncipe; Seychelles; Somalia; South Africa; South Sudan; Sudan; Tanzania; Uganda; Zambia; Zimbabwe ;	26
2	Africa Western and Central	Benin; Burkina Faso; Cabo Verde; Cameroon; Central African Republic; Chad; Congo, Rep.; Côte d’Ivoire; Equatorial Guinea; Gabon; Gambia, The; Ghana; Guinea; Guinea-Bissau; Liberia; Mali; Mauritania; Niger; Nigeria; Senegal; Sierra Leone; Togo ;	22
3	East Asia & Pacific	American Samoa; Australia; Brunei Darussalam; Cambodia; China; Fiji; French Polynesia; Guam; Hong Kong SAR, China; Indonesia; Japan; Kiribati; Korea, Dem. People's Rep.; Korea, Rep.; Lao PDR; Macao SAR, China; Malaysia; Marshall Islands; Micronesia, Fed. Sts.; Mongolia; Myanmar; Nauru; New Caledonia; New Zealand; Northern Mariana Islands; Palau; Papua New Guinea; Philippines; Samoa; Singapore; Solomon Islands; Taiwan, China; Thailand; Timor-Leste; Tonga; Tuvalu; Vanuatu; Vietnam ;	38
4	Europe & Central Asia	Albania; Andorra; Armenia; Austria; Azerbaijan; Belarus; Belgium; Bosnia and Herzegovina; Bulgaria; Channel Islands; Croatia; Cyprus; Czechia; Denmark; Estonia; Faroe Islands; Finland; France; Georgia; Germany; Gibraltar; Greece; Greenland; Hungary; Iceland; Ireland; Isle of Man; Italy; Kazakhstan; Kosovo; Kyrgyz Republic; Latvia; Liechtenstein; Lithuania; Luxembourg; Moldova; Monaco; Montenegro; Netherlands; North Macedonia; Norway; Poland; Portugal; Romania; Russian Federation; San Marino; Serbia; Slovak Republic; Slovenia; Spain; Sweden; Switzerland; Tajikistan; Türkiye; Turkmenistan; Ukraine; United Kingdom; Uzbekistan ;	58
5	Latin America & Caribbean	Antigua and Barbuda; Argentina; Aruba; Bahamas, The; Barbados; Belize; Bolivia; Brazil; British Virgin Islands; Cayman Islands; Chile; Colombia; Costa Rica; Cuba; Curaçao; Dominica; Dominican Republic; Ecuador; El Salvador; Grenada; Guatemala; Guyana; Haiti; Honduras; Jamaica; Mexico; Nicaragua; Panama; Paraguay; Peru; Puerto Rico; Sint Maarten (Dutch part); St. Kitts and Nevis; St. Lucia; St. Martin (French part); St. Vincent and the Grenadines; Suriname; Trinidad and Tobago; Turks and Caicos Islands; Uruguay; Venezuela, RB; Virgin Islands (U.S.) ;	42
6	Middle East & North Africa	Algeria; Bahrain; Djibouti; Egypt, Arab Rep.; Iran, Islamic Rep.; Iraq; Israel; Jordan; Kuwait; Lebanon; Libya; Malta; Morocco; Oman; Qatar; Saudi Arabia; Syrian Arab Republic; Tunisia; United Arab Emirates; West Bank and Gaza; Yemen, Rep. ;	21
7	North America	Bermuda; Canada; United States ;	3
8	South Asia	Afghanistan; Bangladesh; Bhutan; India; Maldives; Nepal; Pakistan; Sri Lanka ;	8

Showing 1 to 8 of 8 entries

Previous1Next

WB_Group_Name is similar to the classification scheme “WB Standard”, but instead of only 7 regions it has now with 8 regions one region more but with 218 the same number of countries.

The difference is caused by the division of the meta-region “Sub-Saharan Africa” into “Africa Eastern and Southern” and “Africa Western and Central”. All the other groupings are the same as in the Tab “WB Standard” of Code Collection 2.2.

2.7.3 Integrating 2nd regional grouping

I will add this second regional grouping of the World Bank classification with the name of WB Region2 into my combined meta dataset “data/meta/m49_ext.rds” where I collect all different groupings criteria.

R Code 2.13 : Integrating the second regional World Bank grouping into the extended M49 dataset

Code

m49_ext <- base::readRDS("data/meta/m49_ext.rds")
wb_grouping2 <- base::readRDS("data/wb/wb_grouping2.rds") |> 
  dplyr::filter(WB_Country_Name != "Channel Islands")
  

m49_ext2 <- dplyr::full_join(
  m49_ext, wb_grouping2, 
  by = dplyr::join_by(`ISO-alpha3 Code` == WB_Country_Code)
  ) |> 
  dplyr::select(-WB_Country_Name) |> 
  dplyr::rename(`WB Region2` = WB_Group_Name) |> 
  dplyr::relocate(WB_Group_Code, .before = `WB Region`) |> 
  dplyr::relocate(`WB Region2`, .after = `WB Region`)

pb_save_data_file("meta", m49_ext2, "m49_ext2.rds")

glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_ext2)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_ext2)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	m49_ext2
Number of rows	250
Number of columns	19
_______________________
Column type frequency:
character	15
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
ISO-alpha3 Code	0	1.00	3	3	250
Country or Area	0	1.00	4	52	250
Region Code	2	0.99	3	3	5
Region Name	2	0.99	4	8	5
Sub-region Code	2	0.99	3	3	17
Sub-region Name	2	0.99	9	31	17
Intermediate Region Code	2	0.99	3	3	22
Intermediate Region Name	2	0.99	9	25	22
M49 Code	2	0.99	3	3	248
ISO-alpha2 Code	0	1.00	2	2	249
WB_Group_Code	33	0.87	3	3	8
WB Region	33	0.87	10	26	7
WB Region2	33	0.87	10	27	8
Income group	33	0.87	10	19	4
Lending category	33	0.87	3	5	4

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p25	p50	p75	p100	hist
LDC	2	0.99	0.19	0.39	0	0	0	1	▇▁▁▁▂
LLDC	2	0.99	0.13	0.34	0	0	0	1	▇▁▁▁▁
SIDS	2	0.99	0.21	0.41	0	0	0	1	▇▁▁▁▂
UN	2	0.99	0.79	0.41	1	1	1	1	▂▁▁▁▇

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 250
#> Columns: 19
#> $ `ISO-alpha3 Code`          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "…
#> $ `Country or Area`          <chr> "Afghanistan", "Albania", "Algeria", "Ameri…
#> $ `Region Code`              <chr> "142", "150", "002", "009", "150", "002", "…
#> $ `Region Name`              <chr> "Asia", "Europe", "Africa", "Oceania", "Eur…
#> $ `Sub-region Code`          <chr> "034", "039", "015", "061", "039", "202", "…
#> $ `Sub-region Name`          <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `Intermediate Region Code` <chr> "034", "039", "015", "061", "039", "017", "…
#> $ `Intermediate Region Name` <chr> "Southern Asia", "Southern Europe", "Northe…
#> $ `M49 Code`                 <chr> "004", "008", "012", "016", "020", "024", "…
#> $ `ISO-alpha2 Code`          <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "…
#> $ LDC                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ LLDC                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0…
#> $ SIDS                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1…
#> $ UN                         <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1…
#> $ WB_Group_Code              <chr> "SAS", "ECS", "MEA", "EAS", "ECS", "AFE", N…
#> $ `WB Region`                <chr> "South Asia", "Europe & Central Asia", "Mid…
#> $ `WB Region2`               <chr> "South Asia", "Europe & Central Asia", "Mid…
#> $ `Income group`             <chr> "Low income", "Upper middle income", "Upper…
#> $ `Lending category`         <chr> "IDA", "IBRD", "IBRD", "NLC", "NLC", "IBRD"…

Checking the missing values: UN M49 - WB = 250 (incl. Sark) - 218 - 1 (Channel Islands) + 2 (Kosovo and Taiwan already from World Bank dataset integrated into UN M49 extended.) = 33 missing values.

2.7.4 Integrating simple groups

My supplements to the UN M49 changes the original file. To differentiate it from the UN M49 source I will call it M49 extended.

With the next step I will add simple groups from the World Bank classification to the M49 extended dataset. Under “simple” I understand groups with only two values: Yes and No or in my case 1 and 0.

R Code 2.14 : Integrating simple groups into the M49 extended dataset

Code

## Function for creating simple groups #########################
pb_add_simple_group <- function(df_comp, df_ext, group_name){
  df_comp <- df_comp |> 
    dplyr::filter(WB_Group_Name == !!group_name) |>
    dplyr::select(2,3) |> 
    dplyr::rename(!!group_name := WB_Group_Name) |> 
    dplyr::mutate(!!group_name := 1)
  
  dplyr::full_join(
      df_ext, df_comp,
      by = dplyr::join_by(`ISO-alpha3 Code` == WB_Country_Code)
    )
}

##  Calling function to create simple groups   ################

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  base::readRDS("data/meta/m49_ext2.rds"),
  "Arab World"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "Euro area"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "OECD members"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "European Union"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "Small states (SST)"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "Central Europe and the Baltics"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "Fragile and conflict affected situations"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "Heavily indebted poor countries (HIPC)"
  )

## identical with M49 `LCD` column ###################
## Least developed countries: UN classification 

# m49_ext3 <- pb_add_simple_group(
#   base::readRDS("data/wb/wb_compositions.rds"),
#   m49_ext3,
#   "Least developed countries: UN classification"
#   )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "IBRD only"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "IDA & IBRD total"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "IDA blend"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "IDA only"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "IDA total"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "Middle income"
  )

m49_ext3 <- pb_add_simple_group(
  base::readRDS("data/wb/wb_compositions.rds"),
  m49_ext3,
  "Low & middle income"
  )

m49_ext3 <- m49_ext3 |>
  dplyr::mutate(dplyr::across(
    `Arab World`:`Low & middle income`, ~ tidyr::replace_na(., 0)))

  
pb_save_data_file("meta", m49_ext3, "m49_ext3.rds")

glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_ext3)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_ext3)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	m49_ext3
Number of rows	250
Number of columns	34
_______________________
Column type frequency:
character	15
numeric	19
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
ISO-alpha3 Code	0	1.00	3	3	250
Country or Area	0	1.00	4	52	250
Region Code	2	0.99	3	3	5
Region Name	2	0.99	4	8	5
Sub-region Code	2	0.99	3	3	17
Sub-region Name	2	0.99	9	31	17
Intermediate Region Code	2	0.99	3	3	22
Intermediate Region Name	2	0.99	9	25	22
M49 Code	2	0.99	3	3	248
ISO-alpha2 Code	0	1.00	2	2	249
WB_Group_Code	33	0.87	3	3	8
WB Region	33	0.87	10	26	7
WB Region2	33	0.87	10	27	8
Income group	33	0.87	10	19	4
Lending category	33	0.87	3	5	4

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p25	p50	p75	p100	hist
LDC	2	0.99	0.19	0.39	0	0	0	1	▇▁▁▁▂
LLDC	2	0.99	0.13	0.34	0	0	0	1	▇▁▁▁▁
SIDS	2	0.99	0.21	0.41	0	0	0	1	▇▁▁▁▂
UN	2	0.99	0.79	0.41	1	1	1	1	▂▁▁▁▇
Arab World	0	1.00	0.09	0.28	0	0	0	1	▇▁▁▁▁
Euro area	0	1.00	0.08	0.27	0	0	0	1	▇▁▁▁▁
OECD members	0	1.00	0.15	0.36	0	0	0	1	▇▁▁▁▂
European Union	0	1.00	0.11	0.31	0	0	0	1	▇▁▁▁▁
Small states (SST)	0	1.00	0.16	0.37	0	0	0	1	▇▁▁▁▂
Central Europe and the Baltics	0	1.00	0.04	0.21	0	0	0	1	▇▁▁▁▁
Fragile and conflict affected situations	0	1.00	0.16	0.36	0	0	0	1	▇▁▁▁▂
Heavily indebted poor countries (HIPC)	0	1.00	0.16	0.36	0	0	0	1	▇▁▁▁▂
IBRD only	0	1.00	0.27	0.45	0	0	1	1	▇▁▁▁▃
IDA & IBRD total	0	1.00	0.58	0.49	0	1	1	1	▆▁▁▁▇
IDA blend	0	1.00	0.07	0.25	0	0	0	1	▇▁▁▁▁
IDA only	0	1.00	0.24	0.43	0	0	0	1	▇▁▁▁▂
IDA total	0	1.00	0.31	0.46	0	0	1	1	▇▁▁▁▃
Middle income	0	1.00	0.42	0.49	0	0	1	1	▇▁▁▁▆
Low & middle income	0	1.00	0.52	0.50	0	1	1	1	▇▁▁▁▇

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 250
#> Columns: 34
#> $ `ISO-alpha3 Code`                          <chr> "AFG", "ALB", "DZA", "ASM",…
#> $ `Country or Area`                          <chr> "Afghanistan", "Albania", "…
#> $ `Region Code`                              <chr> "142", "150", "002", "009",…
#> $ `Region Name`                              <chr> "Asia", "Europe", "Africa",…
#> $ `Sub-region Code`                          <chr> "034", "039", "015", "061",…
#> $ `Sub-region Name`                          <chr> "Southern Asia", "Southern …
#> $ `Intermediate Region Code`                 <chr> "034", "039", "015", "061",…
#> $ `Intermediate Region Name`                 <chr> "Southern Asia", "Southern …
#> $ `M49 Code`                                 <chr> "004", "008", "012", "016",…
#> $ `ISO-alpha2 Code`                          <chr> "AF", "AL", "DZ", "AS", "AD…
#> $ LDC                                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, …
#> $ LLDC                                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ SIDS                                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, …
#> $ UN                                         <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, …
#> $ WB_Group_Code                              <chr> "SAS", "ECS", "MEA", "EAS",…
#> $ `WB Region`                                <chr> "South Asia", "Europe & Cen…
#> $ `WB Region2`                               <chr> "South Asia", "Europe & Cen…
#> $ `Income group`                             <chr> "Low income", "Upper middle…
#> $ `Lending category`                         <chr> "IDA", "IBRD", "IBRD", "NLC…
#> $ `Arab World`                               <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, …
#> $ `Euro area`                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `OECD members`                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `European Union`                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Small states (SST)`                       <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, …
#> $ `Central Europe and the Baltics`           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Fragile and conflict affected situations` <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Heavily indebted poor countries (HIPC)`   <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `IBRD only`                                <dbl> 0, 1, 1, 0, 0, 1, 0, 1, 1, …
#> $ `IDA & IBRD total`                         <dbl> 1, 1, 1, 0, 0, 1, 0, 1, 1, …
#> $ `IDA blend`                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `IDA only`                                 <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `IDA total`                                <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Middle income`                            <dbl> 0, 1, 1, 0, 0, 1, 0, 0, 1, …
#> $ `Low & middle income`                      <dbl> 1, 1, 1, 0, 0, 1, 0, 0, 1, …

2.7.5 Integrating composite groups

Under “composite” groups I understand columns that have more than two values. In contrast to the simple groups where the column types are numerical, the composite groups are character columns. “0” means that the criteria for this column is not applicable.

R Code 2.15 : Integrating composite groups into the M49 extended dataset

Code

## Preparing composition data #################
## for unique string selection
wb_compositions2 <- base::readRDS("data/wb/wb_compositions.rds") |> 
    dplyr::filter(
      WB_Group_Name != "IDA & IBRD",
      WB_Group_Name != "IDA & IBRD total") |> 
    dplyr::filter(
      WB_Group_Name != "High income",
      WB_Group_Name != "Low income",
      WB_Group_Name != "Upper middle income",
      WB_Group_Name != "Lower middle income"
      )

pb_save_data_file("wb", wb_compositions2, "wb_compositions2.rds")


## functions for composite groups
pb_new_composite_group <- function(df_ext, group_name, search_string){
  df_comp2 <- base::readRDS("data/wb/wb_compositions2.rds")
  df_comp2 <- df_comp2 |> 
    dplyr::filter(stringr::str_detect(WB_Group_Name, !!search_string)) |>
    dplyr::rename(!!group_name := WB_Group_Name) |> 
    dplyr::select(2,3) 
  
  dplyr::full_join(
    df_ext, df_comp2,
    by = dplyr::join_by(`ISO-alpha3 Code` == WB_Country_Code)
    )
}


## create composite groups ##################

m49_ext3 <- base::readRDS("data/meta/m49_ext3.rds")

m49_ext4 <- pb_new_composite_group(
  m49_ext3,
  "Small States", 
  "small states"
  )

m49_ext4 <- pb_new_composite_group(
  m49_ext4,
  "Excluding High Income", 
  "(excluding high income)"
  )

m49_ext4 <- pb_new_composite_group(
  m49_ext4,
  "Demographic Divididend", 
  "demographic dividend"
  )

m49_ext4 <- pb_new_composite_group(
  m49_ext4,
  "IDA & IBRD", 
  "(IDA & IBRD)"
  )

m49_ext4 <- m49_ext4 |>
  dplyr::mutate(dplyr::across(
    `Small States`:`IDA & IBRD`, ~ tidyr::replace_na(., "0")))


pb_save_data_file("meta", m49_ext4, "m49_ext4.rds")


glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(m49_ext4)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(m49_ext4)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	m49_ext4
Number of rows	250
Number of columns	38
_______________________
Column type frequency:
character	19
numeric	19
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
ISO-alpha3 Code	0	1.00	3	3	250
Country or Area	0	1.00	4	52	250
Region Code	2	0.99	3	3	5
Region Name	2	0.99	4	8	5
Sub-region Code	2	0.99	3	3	17
Sub-region Name	2	0.99	9	31	17
Intermediate Region Code	2	0.99	3	3	22
Intermediate Region Name	2	0.99	9	25	22
M49 Code	2	0.99	3	3	248
ISO-alpha2 Code	0	1.00	2	2	249
WB_Group_Code	33	0.87	3	3	8
WB Region	33	0.87	10	26	7
WB Region2	33	0.87	10	27	8
Income group	33	0.87	10	19	4
Lending category	33	0.87	3	5	4
Small States	0	1.00	1	27	4
Excluding High Income	0	1.00	1	50	6
Demographic Divididend	0	1.00	1	26	5
IDA & IBRD	0	1.00	1	39	7

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p25	p50	p75	p100	hist
LDC	2	0.99	0.19	0.39	0	0	0	1	▇▁▁▁▂
LLDC	2	0.99	0.13	0.34	0	0	0	1	▇▁▁▁▁
SIDS	2	0.99	0.21	0.41	0	0	0	1	▇▁▁▁▂
UN	2	0.99	0.79	0.41	1	1	1	1	▂▁▁▁▇
Arab World	0	1.00	0.09	0.28	0	0	0	1	▇▁▁▁▁
Euro area	0	1.00	0.08	0.27	0	0	0	1	▇▁▁▁▁
OECD members	0	1.00	0.15	0.36	0	0	0	1	▇▁▁▁▂
European Union	0	1.00	0.11	0.31	0	0	0	1	▇▁▁▁▁
Small states (SST)	0	1.00	0.16	0.37	0	0	0	1	▇▁▁▁▂
Central Europe and the Baltics	0	1.00	0.04	0.21	0	0	0	1	▇▁▁▁▁
Fragile and conflict affected situations	0	1.00	0.16	0.36	0	0	0	1	▇▁▁▁▂
Heavily indebted poor countries (HIPC)	0	1.00	0.16	0.36	0	0	0	1	▇▁▁▁▂
IBRD only	0	1.00	0.27	0.45	0	0	1	1	▇▁▁▁▃
IDA & IBRD total	0	1.00	0.58	0.49	0	1	1	1	▆▁▁▁▇
IDA blend	0	1.00	0.07	0.25	0	0	0	1	▇▁▁▁▁
IDA only	0	1.00	0.24	0.43	0	0	0	1	▇▁▁▁▂
IDA total	0	1.00	0.31	0.46	0	0	1	1	▇▁▁▁▃
Middle income	0	1.00	0.42	0.49	0	0	1	1	▇▁▁▁▆
Low & middle income	0	1.00	0.52	0.50	0	1	1	1	▇▁▁▁▇

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 250
#> Columns: 38
#> $ `ISO-alpha3 Code`                          <chr> "AFG", "ALB", "DZA", "ASM",…
#> $ `Country or Area`                          <chr> "Afghanistan", "Albania", "…
#> $ `Region Code`                              <chr> "142", "150", "002", "009",…
#> $ `Region Name`                              <chr> "Asia", "Europe", "Africa",…
#> $ `Sub-region Code`                          <chr> "034", "039", "015", "061",…
#> $ `Sub-region Name`                          <chr> "Southern Asia", "Southern …
#> $ `Intermediate Region Code`                 <chr> "034", "039", "015", "061",…
#> $ `Intermediate Region Name`                 <chr> "Southern Asia", "Southern …
#> $ `M49 Code`                                 <chr> "004", "008", "012", "016",…
#> $ `ISO-alpha2 Code`                          <chr> "AF", "AL", "DZ", "AS", "AD…
#> $ LDC                                        <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, …
#> $ LLDC                                       <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ SIDS                                       <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, …
#> $ UN                                         <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, …
#> $ WB_Group_Code                              <chr> "SAS", "ECS", "MEA", "EAS",…
#> $ `WB Region`                                <chr> "South Asia", "Europe & Cen…
#> $ `WB Region2`                               <chr> "South Asia", "Europe & Cen…
#> $ `Income group`                             <chr> "Low income", "Upper middle…
#> $ `Lending category`                         <chr> "IDA", "IBRD", "IBRD", "NLC…
#> $ `Arab World`                               <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, …
#> $ `Euro area`                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `OECD members`                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `European Union`                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Small states (SST)`                       <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, …
#> $ `Central Europe and the Baltics`           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Fragile and conflict affected situations` <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Heavily indebted poor countries (HIPC)`   <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `IBRD only`                                <dbl> 0, 1, 1, 0, 0, 1, 0, 1, 1, …
#> $ `IDA & IBRD total`                         <dbl> 1, 1, 1, 0, 0, 1, 0, 1, 1, …
#> $ `IDA blend`                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `IDA only`                                 <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `IDA total`                                <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ `Middle income`                            <dbl> 0, 1, 1, 0, 0, 1, 0, 0, 1, …
#> $ `Low & middle income`                      <dbl> 1, 1, 1, 0, 0, 1, 0, 0, 1, …
#> $ `Small States`                             <chr> "0", "0", "0", "0", "0", "0…
#> $ `Excluding High Income`                    <chr> "0", "Europe & Central Asia…
#> $ `Demographic Divididend`                   <chr> "Pre-demographic dividend",…
#> $ `IDA & IBRD`                               <chr> "South Asia (IDA & IBRD)", …

This information comes from the “Notes” sheet of the Excel file.↩︎

2.1 Download World Bank Data

2.2 Inspect World Bank Data

2.3 Show Data

2.4 Description of WB Data

2.4.1 List of Economies

2.4.1.1 Columns

2.4.1.2 Missing values

2.4.1.3 Income groups

2.4.1.4 Lending categories

2.4.1.5 Miscellanous

2.4.2 Compositions

2.4.3 Interim Summary

2.5 Analysis of WB Data

2.5.1 Separate list of countries from “List of Economies”

2.5.2 Differences between M49 and World Bank classification

2.5.3 Interim Summary

2.6 Combining M49 and WB

2.7 Analysing compositions data

2.7.1 Multiple Group Membership

2.7.2 More African Groupings

2.7.3 Integrating 2nd regional grouping

2.7.4 Integrating simple groups

2.7.5 Integrating composite groups

2.7 Analysing `compositions` data