2 Pipeline Explanation

This page provides comprehensive explanation for the GDP data processing R script.

2.1 Data Processing ETL Pipeline

  # Extract 
get_unsd_gdp_data() %>%
  get_taiwan_gdp_data() %>%
  
  # Transform
  compute_special_cases() %>%
  estimate_last_year() %>%
  round_values() %>%
  delete_data_out_of_valid_range() %>%
  add_economy_labels() %>%
  compute_groups_of_economies() %>%
  add_comments() %>%
  
  # Load
  export_to_general_csv("gdp_update.csv") %>%
  export_to_usis_csv("gdp_update_usis.csv")

2.2 Pipeline Functions

`get_unsd_gdp_data()`

Downloads GDP data using UNSD API (both constant and current prices).
Merges, reshapes, and formats the dataset.

`get_taiwan_gdp_data(df)`

Downloads GDP data from Taiwan NSO (both constant and current prices).
Rebases the GDP at constant prices from the year 2021 to 2015.
Calculates TWD to USD exchange rates.
- GDP data at current prices in USD are converted from TWD using annual period-average exchange rates.
- GDP data in constant prices in USD are converted from TWD using the annual period-average exchange rate of the base year (2015) for all years.

`compute_special_cases(df)`

Handles historical and geopolitical inconsistencies by merging country records (e.g., Yugoslavia, USSR).
Described in-depth in chapter Special Cases.

`estimate_last_year(df)`

Nominal GDP for the last year is estimated using growth rates found in GDP growth rates.xlsx.
Real GDP for the last year is estimated using GDP deflators and CPI.
- GDP deflators are used where available, CPI otherwise.
- GDP deflators are converted using exchange rates from national currencies to USD and rebased to the year 2015.
- CPI is rebased to the year 2015.
This function is skipped, if enable_estimate_last_year is set to FALSE.

`round_values(df)`

Rounds values to the nearest integer using base R round() function.

`delete_data_out_of_valid_range(df)`

Filters out data points falling outside the valid year range for each country.

`compute_groups_of_economies(df)`

Computes GDP aggregates for groups using hierarchical mappings.

`add_economy_labels(df)`

Joins human-readable economy labels using economy codes.

`add_comments(df)`

Adds “CommentEN” and “CommentFR” columns remarking on values which were calculated in compute_missing_values(df).

`export_to_generic_csv(df, filename)`

Saves the dataset to a generic CSV file.

`export_to_usis_csv(df, filename)`

Saves the dataset to a CSV file used by USIS for upload to UNCTADstat.

2.3 Helper functions

`read_unsd(file_id)`

API call to UNSD for collecting a file given its ID.
used for collecting GDP in constant and current prices.

`read_usis(series, source, measure)`

Reads a dataset from USIS given its Series, Source and Measure codes.
Used for collecting GDP deflators, exchange rates and CPI.

`get_gdp_deflators(estimate_constant)`

Collects GDP deflators and CPI from USIS.
CPI is used as a proxy when a GDP deflator is not available.
Economies estimated using CPI are saved to estimated_with_cpi.csv.

`expand_hierarchy(df)`

Expands the groups defined in Dim_Countries_Hierarchy_All.csv.
For example Americas is defined as an aggregate of Northern America and Latin America and the Caribbean. The latter is defined as a combination of the Caribbean, Central America and South America. These three groups are defined using individual economies.
This function expands all definitions such that each group is defined only using its individual economies.

1 Usage

3 Special Cases