Chapter 4 Data transformation

Some of the source indices had to be inverted to make the interpretation of the PCA results more straightforward. Values were inverted by deducting the original value from the theoretical maximum value of that variable, or the observed maximum value in the absence of a clear theoretical maximum. After this transformation, for all source indices, a higher value signified a better result. This approach greatly simplified the interpretation of results.

Inversion is especially relevant for the Equality pillar. For instance, the income concentration ratio (IGI 3.1), based on the Gini coefficient, ranges between 0 and 100, where 0 signifies total equality and 100 total inequality. The Gini was inverted (using the theoretical maximum of 100) so that, counter intuitively, a higher value signified less income inequality i.e. now 100 represents maximum equality. The poverty headcount ratio (IGI 3.2), which also ranges between 0 and 100 per cent of population, was also inverted (again using the theoretical maximum of 100). In the inverted form, a higher value signifies less poverty i.e. less people with an income below US$5.50 per day. Indicators in the inequality pillar were also transformed into parity ratios, where necessary i.e. to compare differences between two groups of population.

For example, under-five mortality rate (deaths per 1,000 live births) (IGI 2.3) was inverted by using the observed maximum among the countries. In the inverted form, a higher value signified more live births. CO\(_2\) emissions per unit of GDP (IGI 4.1) is another example, where in an inverted form, the higher value signified fewer CO\(_2\) emissions per unit of GDP. The same is applied for energy intensity (IGI 4.2).

A symmetric transformation was undertaken for indicators of secondary school enrollment (IGI 3.3), ratio of female to male employment-to-population ratio (IGI 3.4), ratio of youth to adult employment-to-population ratio (IGI 3.5), gender parity in the number of seats held by women and men in national parliaments (IGI 3.6), ratio of female to male labour force participation rate (IGI 3.7), ratio of female age of first marriage to male age of first marriage (IGI 3.8) and ratio of the share of wage and salaried workers in women’s to men’s employment (IGI 3.9). After the transformation, the same rate for female and male (or youth and adults) equals one - the best possible value. The proportion of seats held by women in national parliaments (percentage of total number of seats) was transformed so that a 50-50 parity in Parliament (the optimal solution) equates to the highest possible value (value = 1) and all other distributions or solutions are less than one.

4.1 By indicator

  • IGI 2.3 (SH_DYN_MORT): Under-five mortality rate (deaths per 1,000 live births)

    • The indicator was inverted by using the observed maximum among the countries.
  • IGI 3.1 (SI.POV.GINI): Gini index

    • For instance, the income concentration ratio, based on the Gini coefficient, ranges between 0 and 100, where 0 signifies total equality and 100 total inequality. The Gini was inverted (using the theoretical maximum of 100) so that, counter intuitively, a higher value signified less income inequality i.e. now 100 represents maximum equality.
  • IGI 3.2 (SI.POV.LMIC): Poverty headcount ratio at $3.65 a day (2017 PPP) (% of population)

    • The poverty headcount ratio, which also ranges between 0 and 100 per cent of population, was also inverted (again using the theoretical maximum of 100). In the inverted form, a higher value signifies less poverty i.e. less people with an income below US$5.50 per day.
  • IGI 3.3 (SE.ENR.SECO.FM.ZS): School enrolment, secondary (gross), gender parity index (GPI)

    • A symmetric transformation was undertaken.
    • After the transformation, the same rate for female and male equals one - the best possible value.
  • IGI 3.4 (SL.EMP.TOTL.SP): Ratio of female to male employment-to-population ratio (%) (modeled ILO estimate)

    • First, a ratio is calculated by using indicator disaggregated by sex.
    • A symmetric transformation was undertaken.
    • After the transformation, the same rate for female and male equals one - the best possible value.
  • IGI 3.5 (EMP_2WAP): Ratio of youth to adult employment-to-population ratio (modeled ILO estimate)

    • First, a ratio is calculated by using indicator disaggregated by sex.
    • A symmetric transformation was undertaken.
    • After the transformation, the same rate for youth and adult equals one - the best possible value.
  • IGI 3.6 (SG_GEN_PARL_PAR): Gender parity in the number of seats held by women and men in national parliaments

    • The proportion of seats held by women in national parliaments (percentage of total number of seats) was transformed so that a 50-50 parity in Parliament (the optimal solution) equates to the highest possible value (value = 1) and all other distributions or solutions are less than one.
    • Using the indicator “Proportion of seats held by women in national parliaments (% of total number of seats)” we calculate the remaining share of men.
    • A symmetric transformation was undertaken.
    • After the transformation, the same rate for female and male equals one - the best possible value.

    \[SG\_GEN\_PARL\_PAR = \frac{SG\_GEN\_PARL}{100 - (SG\_GEN\_PARL)}\]

  • IGI 3.7 (SL.TLF.CACT.FM.ZS): Ratio of female to male labour force participation rate (%) (modeled ILO estimate)

    • A symmetric transformation was undertaken.
    • After the transformation, the same rate for female and male equals one - the best possible value.
  • IGI 3.8 (AFMR): Ratio of female age of first marriage to male age of first marriage

    • A symmetric transformation was undertaken.
    • After the transformation, the same rate for female and male equals one - the best possible value.
  • IGI 3.9 (FMEMP): Ratio of the share of wage and salaried workers in women’s to men’s employment

    • A symmetric transformation was undertaken.
    • After the transformation, the same rate for female and male equals one - the best possible value.
  • IGI 3.10 (MCARE): Employment in services, female (% of female employment) (modeled ILO estimate), raised to the power of the inverse of the Palma ratio

    • The indicator is calculated as ratio of “Employment in services, female (% of female employment) (modeled ILO estimate)” to “Labor force, female (% of total labor force)” multiplied by the inversed “Palma ratio”

    \[MCARE = {\bigg(SL.SRV.EMPL.FE.ZS*SL.TLF.TOTL.FE.ZS\bigg)}^{(\frac{1}{PALMA})}\]

  • IGI 4.1 (EN_ATM_CO2GDP): Carbon dioxide emissions per unit of GDP PPP (kilogrammes of CO2 per constant 2017 United States dollars)

    • CO2 emissions per unit of GDP is another example, where in an inverted form, the higher value signified fewer CO2 emissions per unit of GDP.
  • IGI 4.2 (EG_EGY_PRIM): Energy intensity level of primary energy (megajoules per constant 2017 purchasing power parity GDP)

    • Energy intensity is another example, where in an inverted form, the higher value signified less energy per unit of GDP.

4.2 By transformation

  • Special calculations

    • Ratio: 3.4, 3.5, 3.8, 3.9

    \[ratio = female/male\]

    • Gender parity: 3.6

    \[SG\_GEN\_PARL\_PAR = \frac{SG\_GEN\_PARL}{100 - (SG\_GEN\_PARL)}\]

    • Adjusted for the inversed Palma ratio: 3.10

      \[MCARE = {\bigg(SL.SRV.EMPL.FE.ZS*SL.TLF.TOTL.FE.ZS\bigg)}^{(\frac{1}{PALMA})}\]

  • Inverse transformation with 100

    • Used for indicators: 3.1, 3.2

    \[inv(value) = 100 - value\]

  • Inverse transformation with max

    • Used for indicators: 2.3, 4.1, 4.2

    \[inv(value) = max(value) - value\]

  • Inverse symmetric transformation

    • Used for indicators: 3.4, 3.5, 3.6, 3.8, 3.9

    \[inv(value) = 1 - abs(value-1)\]

    • Used for indicators: 3.3, 3.7

    \[inv(value) = 1 - abs(value/100-1)\]