Appendix B: Countries & Variables

Table of content

Inspect countries and variables of the Summary Table

The idea of this chapter is to get an idea what kind of data is available for different countries. Scrolling the Summary Table interactively is a not very efficient way. Even filtering for countries does sometimes not work (for instance: it works for Austria, but not for Germany)¹

One idea that comes to my mind is a list of countries and the number of their rows as an indicator how many variables are available. Perhaps I will invent other filters as well for a better understanding of the data.

B.1 Data

B.1.1 Download Summary Table

R Code B.1 : Download Summary Table

Code

## define variables
url = "https://wid.world/wp-content/uploads/2019/05/WID_SummaryTable_24May2019.xlsx"
chapter_folder = "chap81"
file_name = "summary_table.xlsx"


## create folder for chapter if not already done
chap_folder <-
        base::paste0(
            here::here(),
            paste0("/data/", chapter_folder, "/")
        )
    if (!base::file.exists(chap_folder))
    {base::dir.create(chap_folder)}

## get summary table
destfile <- base::paste0(chap_folder, file_name)
utils::download.file(url, destfile)
summary_table <- readxl::read_xlsx(destfile)


## save cleaned data
pb_save_data_file("chap81", summary_table, "summary_table.rds")

(For this R code chunk is no output available)

B.1.2 Explore Summary Table

R Code B.2 : Explore Data of the Summary Table

Code

summary_table <- readRDS("data/chap81/summary_table.rds")

skimr::skim(summary_table)

Data summary
Name	summary_table
Number of rows	6253
Number of columns	17
_______________________
Column type frequency:
character	16
numeric	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Country	0	1.00	3	53	331
Country Code	0	1.00	2	6	332
Short name of variable	0	1.00	5	59	184
Type(s) of variable	0	1.00	1	31	12
Variable category	45	0.99	20	27	5
Variable level(s)	45	0.99	1	7	8
Years	0	1.00	4	9	341
Age groups	0	1.00	7	60	8
Population categories	0	1.00	8	55	29
WID.world code	0	1.00	10	22	312
Percentiles	0	1.00	5	1087	85
Years available for these percentiles	0	1.00	31	41	3
Simple description of variable	0	1.00	30	990	183
Technical description of variable	3659	0.41	48	348	79
Method used for computation	3766	0.40	1	1445	544
Source	1426	0.77	12	2045	223

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
data_quality	5735	0.08	2.85	0.75	2	2	3	3	5	▆▇▁▂▁

B.2 Inspect Summary Table

B.2.1 Number of Variables per Country

Code Collection B.1

Variables per Country
Without Subregions

R Code B.3 : Number of Records (=Variables) per Country

Code

summary_table |> 
    dplyr::count(Country, `Country Code`) |> 
    DT::datatable()

Show entries

Search:

	Country	Country Code	n
1	Afghanistan	AF	13
2	Africa	QB	10
3	Alabama	US-AL	5
4	Alaska	US-AK	5
5	Albania	AL	21
6	Algeria	DZ	13
7	American Samoa	AS	6
8	Americas	QC	10
9	Andorra	AD	6
10	Angola	AO	13

Showing 1 to 10 of 332 entries

Previous1 2 3 4 5…34Next

R Code B.4 : Number of Records (=Variables) per Country without Subregions

Code

summary_table |> 
    dplyr::filter(!stringr::str_detect(`Country Code`, '-')) |> 
    dplyr::count(Country, `Country Code`) |> 
    DT::datatable()

Show entries

Search:

	Country	Country Code	n
1	Afghanistan	AF	13
2	Africa	QB	10
3	Albania	AL	21
4	Algeria	DZ	13
5	American Samoa	AS	6
6	Americas	QC	10
7	Andorra	AD	6
8	Angola	AO	13
9	Anguilla	AI	7
10	Antigua and Barbuda	AG	13

Showing 1 to 10 of 265 entries

Previous1 2 3 4 5…27Next

Even if the codes for the sub-regions are eliminated the list still doesn’t show only countries because it includes “meta”-areas like “Africa”, “Americas” etc.

B.2.2 Display Many Columns At Once

Another interesting type of table for inspecting data is an interactive table with all columns, except the very long descriptions of the columns 13-16. To get a better view I have created a special page with a full-page layout covering the full width of the browser: Go to Appendix C.

B.2.3 List Variables of the Summary Table

Another important type of table for understanding the datasets is a table with the short name of the variables with the simple, technical and methodical description.

R Code B.5 : Unique Short name of variables

Code

summary_table |>
    dplyr::distinct(`Short name of variable`)

#> # A tibble: 184 × 1
#>    `Short name of variable`                        
#>    <chr>                                           
#>  1 National income                                 
#>  2 Net domestic product                            
#>  3 Consumption of fixed capital of national economy
#>  4 Gross domestic product                          
#>  5 Net foreign income                              
#>  6 National income price index                     
#>  7 Population                                      
#>  8 PPP conversion factor, LCU per EUR              
#>  9 Market exchange rate, LCU per EUR               
#> 10 PPP conversion factor, LCU per USD              
#> # ℹ 174 more rows

B.2.4 Explain All 184 Variables of the Summary Table

It turned out that the number of unique ’Short name of variable` is the same number of rows as for the country with the most rows (= USA with 184 rows). It is therefore a reasonable assumption that all unique variable names can be found in the USA part of the database. See Section C.2.

B.2.5 Explain Variables for Austria of the Summary Table

I believe that I will not need most of the 184 variables explained in the Summary Table. As a first approach I could start with the 21 variables that are registered with Austria. See Section C.3.

Caution B.1: Post- & Pre-tax national income occur quadruple

It turned out that two variables (Post-tax national income and Pre-tax national income) occur four times. But this happen the case also for other countries, like USA, Germany, Belgium etc.

Maybe these variables were added with different research projects?

So in the end there are not 184 variables in total but 178 and Austria has not 21 but only 15 variables!

Maybe a bug in the program of the web interface?↩︎