2 Pipeline Explanation

This page provides comprehensive explanation for the GDP data processing R script.


2.1 Data Processing ETL Pipeline

  # Extract 
get_unsd_gdp_data() %>%
  get_taiwan_gdp_data() %>%
  
  # Transform
  compute_missing_values() %>%
  estimate_last_year() %>%
  round_values() %>%
  delete_data_out_of_valid_range() %>%
  add_economy_labels() %>%
  compute_groups_of_economies() %>%
  add_comments() %>%
  
  # Load
  export_to_general_csv("gdp_update.csv") %>%
  export_to_usis_csv("gdp_update_usis.csv")

2.2 Pipeline Functions

get_unsd_gdp_data()

  • Downloads GDP data using UNSD API (both constant and current prices).
  • Merges, reshapes, and formats the dataset.

get_taiwan_gdp_data(df)

  • Downloads GDP data from Taiwan NSO (both constant and current prices).
  • Rebases the GDP at constant prices from the year 2021 to 2015.
  • Calculates TWD to USD exchange rates.
    • GDP data at current prices in USD are converted from TWD using annual period-average exchange rates.
    • GDP data in constant prices in USD are converted from TWD using the annual period-average exchange rate of the base year (2015) for all years.

compute_missing_values(df)

  • Handles historical and geopolitical inconsistencies by merging country records (e.g., Yugoslavia, USSR).
  • Described in-depth in chapter Special Cases.

estimate_last_year(df)

  • Nominal GDP for the last year is estimated using growth rates found in GDP growth rates.xlsx.
  • Real GDP for the last year is estimated using GDP deflators and CPI.
    • GDP deflators are used where available, CPI otherwise.
    • GDP deflators are converted using exchange rates from national currencies to USD and rebased to the year 2015.
    • CPI is rebased to the year 2015.
  • This function is skipped, if enable_estimate_last_year is set to FALSE.

round_values(df)

  • Rounds values to the nearest integer using base R round() function.

delete_data_out_of_valid_range(df)

  • Filters out data points falling outside the valid year range for each country.

compute_groups_of_economies(df)

  • Computes GDP aggregates for groups using hierarchical mappings.

add_economy_labels(df)

  • Joins human-readable economy labels using economy codes.

add_comments(df)

  • Adds “CommentEN” and “CommentFR” columns remarking on values which were calculated in compute_missing_values(df).

export_to_generic_csv(df, filename)

  • Saves the dataset to a generic CSV file.

export_to_usis_csv(df, filename)

  • Saves the dataset to a CSV file used by USIS for upload to UNCTADstat.

2.3 Helper functions

read_unsd(file_id)

  • API call to UNSD for collecting a file given its ID.
  • used for collecting GDP in constant and current prices.

read_usis(series, source, measure)

  • Reads a dataset from USIS given its Series, Source and Measure codes.
  • Used for collecting GDP deflators, exchange rates and CPI.

get_gdp_deflators(estimate_constant)

  • Collects GDP deflators and CPI from USIS.
  • CPI is used as a proxy when a GDP deflator is not available.
  • Economies estimated using CPI are saved to estimated_with_cpi.csv.

expand_hierarchy(df)

  • Expands the groups defined in Dim_Countries_Hierarchy_All.csv.
  • For example Americas is defined as an aggregate of Northern America and Latin America and the Caribbean. The latter is defined as a combination of the Caribbean, Central America and South America. These three groups are defined using individual economies.
  • This function expands all definitions such that each group is defined only using its individual economies.