2 Pipeline Explanation
This page provides comprehensive explanation for the GDP data processing R script.
2.1 Data Processing ETL Pipeline
# Extract
get_unsd_gdp_data() %>%
get_taiwan_gdp_data() %>%
# Transform
compute_missing_values() %>%
estimate_last_year() %>%
round_values() %>%
delete_data_out_of_valid_range() %>%
add_economy_labels() %>%
compute_groups_of_economies() %>%
add_comments() %>%
# Load
export_to_general_csv("gdp_update.csv") %>%
export_to_usis_csv("gdp_update_usis.csv")
2.2 Pipeline Functions
get_unsd_gdp_data()
- Downloads GDP data using UNSD API (both constant and current prices).
- Merges, reshapes, and formats the dataset.
get_taiwan_gdp_data(df)
- Downloads GDP data from Taiwan NSO (both constant and current prices).
- Rebases the GDP at constant prices from the year 2021 to 2015.
- Calculates TWD to USD exchange rates.
- GDP data at current prices in USD are converted from TWD using annual period-average exchange rates.
- GDP data in constant prices in USD are converted from TWD using the annual period-average exchange rate of the base year (2015) for all years.
compute_missing_values(df)
- Handles historical and geopolitical inconsistencies by merging country records (e.g., Yugoslavia, USSR).
- Described in-depth in chapter Special Cases.
estimate_last_year(df)
- Nominal GDP for the last year is estimated using growth rates found in
GDP growth rates.xlsx
. - Real GDP for the last year is estimated using GDP deflators and CPI.
- GDP deflators are used where available, CPI otherwise.
- GDP deflators are converted using exchange rates from national currencies to USD and rebased to the year 2015.
- CPI is rebased to the year 2015.
- This function is skipped, if
enable_estimate_last_year
is set to FALSE.
round_values(df)
- Rounds values to the nearest integer using base R
round()
function.
delete_data_out_of_valid_range(df)
- Filters out data points falling outside the valid year range for each country.
add_comments(df)
- Adds “CommentEN” and “CommentFR” columns remarking on values which were calculated in
compute_missing_values(df)
.
export_to_usis_csv(df, filename)
- Saves the dataset to a CSV file used by USIS for upload to UNCTADstat.
2.3 Helper functions
read_unsd(file_id)
- API call to UNSD for collecting a file given its ID.
- used for collecting GDP in constant and current prices.
read_usis(series, source, measure)
- Reads a dataset from USIS given its Series, Source and Measure codes.
- Used for collecting GDP deflators, exchange rates and CPI.
get_gdp_deflators(estimate_constant)
- Collects GDP deflators and CPI from USIS.
- CPI is used as a proxy when a GDP deflator is not available.
- Economies estimated using CPI are saved to
estimated_with_cpi.csv
.
expand_hierarchy(df)
- Expands the groups defined in
Dim_Countries_Hierarchy_All.csv
. - For example Americas is defined as an aggregate of Northern America and Latin America and the Caribbean. The latter is defined as a combination of the Caribbean, Central America and South America. These three groups are defined using individual economies.
- This function expands all definitions such that each group is defined only using its individual economies.