Appendix B: Countries & Variables

Table of content

Inspect countries and variables of the Summary Table

The idea of this chapter is to get an idea what kind of data is available for different countries. Scrolling the Summary Table interactively is a not very efficient way. Even filtering for countries does sometimes not work (for instance: it works for Austria, but not for Germany)1

One idea that comes to my mind is a list of countries and the number of their rows as an indicator how many variables are available. Perhaps I will invent other filters as well for a better understanding of the data.

B.1 Data

B.1.1 Download Summary Table

R Code B.1 : Download Summary Table

Code
## define variables
url = "https://wid.world/wp-content/uploads/2019/05/WID_SummaryTable_24May2019.xlsx"
chapter_folder = "chap81"
file_name = "summary_table.xlsx"


## create folder for chapter if not already done
chap_folder <-
        base::paste0(
            here::here(),
            paste0("/data/", chapter_folder, "/")
        )
    if (!base::file.exists(chap_folder))
    {base::dir.create(chap_folder)}

## get summary table
destfile <- base::paste0(chap_folder, file_name)
utils::download.file(url, destfile)
summary_table <- readxl::read_xlsx(destfile)


## save cleaned data
pb_save_data_file("chap81", summary_table, "summary_table.rds")

(For this R code chunk is no output available)

B.1.2 Explore Summary Table

R Code B.2 : Explore Data of the Summary Table

Code
summary_table <- readRDS("data/chap81/summary_table.rds")

skimr::skim(summary_table)
Data summary
Name summary_table
Number of rows 6253
Number of columns 17
_______________________
Column type frequency:
character 16
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Country 0 1.00 3 53 0 331 0
Country Code 0 1.00 2 6 0 332 0
Short name of variable 0 1.00 5 59 0 184 0
Type(s) of variable 0 1.00 1 31 0 12 0
Variable category 45 0.99 20 27 0 5 0
Variable level(s) 45 0.99 1 7 0 8 0
Years 0 1.00 4 9 0 341 0
Age groups 0 1.00 7 60 0 8 0
Population categories 0 1.00 8 55 0 29 0
WID.world code 0 1.00 10 22 0 312 0
Percentiles 0 1.00 5 1087 0 85 0
Years available for these percentiles 0 1.00 31 41 0 3 0
Simple description of variable 0 1.00 30 990 0 183 0
Technical description of variable 3659 0.41 48 348 0 79 0
Method used for computation 3766 0.40 1 1445 0 544 0
Source 1426 0.77 12 2045 0 223 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
data_quality 5735 0.08 2.85 0.75 2 2 3 3 5 ▆▇▁▂▁

B.2 Inspect Summary Table

B.2.1 Number of Variables per Country

Code Collection B.1  

R Code B.3 : Number of Records (=Variables) per Country

Code
summary_table |> 
    dplyr::count(Country, `Country Code`) |> 
    DT::datatable()

R Code B.4 : Number of Records (=Variables) per Country without Subregions

Code
summary_table |> 
    dplyr::filter(!stringr::str_detect(`Country Code`, '-')) |> 
    dplyr::count(Country, `Country Code`) |> 
    DT::datatable()

Even if the codes for the sub-regions are eliminated the list still doesn’t show only countries because it includes “meta”-areas like “Africa”, “Americas” etc.

B.2.2 Display Many Columns At Once

Another interesting type of table for inspecting data is an interactive table with all columns, except the very long descriptions of the columns 13-16. To get a better view I have created a special page with a full-page layout covering the full width of the browser: Go to Appendix C.

B.2.3 List Variables of the Summary Table

Another important type of table for understanding the datasets is a table with the short name of the variables with the simple, technical and methodical description.

R Code B.5 : Unique Short name of variables

Code
summary_table |>
    dplyr::distinct(`Short name of variable`)
#> # A tibble: 184 × 1
#>    `Short name of variable`                        
#>    <chr>                                           
#>  1 National income                                 
#>  2 Net domestic product                            
#>  3 Consumption of fixed capital of national economy
#>  4 Gross domestic product                          
#>  5 Net foreign income                              
#>  6 National income price index                     
#>  7 Population                                      
#>  8 PPP conversion factor, LCU per EUR              
#>  9 Market exchange rate, LCU per EUR               
#> 10 PPP conversion factor, LCU per USD              
#> # ℹ 174 more rows

B.2.4 Explain All 184 Variables of the Summary Table

It turned out that the number of unique ’Short name of variable` is the same number of rows as for the country with the most rows (= USA with 184 rows). It is therefore a reasonable assumption that all unique variable names can be found in the USA part of the database. See Section C.2.

B.2.5 Explain Variables for Austria of the Summary Table

I believe that I will not need most of the 184 variables explained in the Summary Table. As a first approach I could start with the 21 variables that are registered with Austria. See Section C.3.

Caution B.1: Post- & Pre-tax national income occur quadruple

It turned out that two variables (Post-tax national income and Pre-tax national income) occur four times. But this happen the case also for other countries, like USA, Germany, Belgium etc.

Maybe these variables were added with different research projects?

So in the end there are not 184 variables in total but 178 and Austria has not 21 but only 15 variables!


  1. Maybe a bug in the program of the web interface?↩︎