Chapter 5 Principal Component Analysis
Compiling composite indicators requires the use of multivariate analysis in order to investigate the overall structure of the indicators, assess the suitability of the data and guide methodological choices e.g. weighting and aggregation of the indicator components. Composite indicators calculated in an arbitrary manner where little attention is given to the interrelationships between the source variables may lead to misleading results that are unhelpful for policy purposes.
Principal component analysis (PCA) is one of the most widely used techniques for multivariate analysis. First introduced by Pearson (Pearson 1901) and developed independently by Hotelling (Hotelling 1933), PCA can be used to reveal interrelationships among a set of variables. This is done by transforming potentially correlated variables into a set of uncorrelated variables using their covariance matrix or its standardized form, the correlation matrix. This enables the variability within the underlying information contained in N variables to be identified. As such, PCA can be used to emphasize patterns among multivariate data.
Through an orthogonal linear transformation, PCA calculates the projection of the original data into a new set of \(N\) coordinates, known as principal components. This new space has some interesting characteristics, including that its coordinates are mutually orthogonal and that they are ordered in decreasing order according to the amount of information contained from the original variables. Therefore, the first principal component (PC1) accounts for the largest amount of the total variability in the set of \(N\) original variables. The second vector (PC2), orthogonal to the first, accounts for the largest amount of the remaining variability in the original variables. Each succeeding PC is linearly uncorrelated to the others and accounts for the largest amount of the remaining variability (Jolliffe 2002). By selecting the first \(N\) principal components, the number of dimensions to be included in an analysis can be reduced (from N to n) while retaining as much of the information in the original variables as possible, a process called dimensionality reduction. The ranking of the principal components in order of their significance (based on the proportion of total variability that they capture) is denoted by the eigenvalues associated with each PC.
The principal components associated with all variables identified as relevant for measuring inclusive growth are calculated. By retaining only those principal components whose eigenvalue is greater than 1 and explained variance greater than 10%, a smaller number of independent indices of inclusive growth can be generated.
Before undertaking PCA, it was necessary to convert the original variables into standard comparable units as different scales could affect the application of the method. Therefore, each variable was standardized to have a mean of zero and a standard deviation of one. The PCA was then applied to the completed, standardized data. The results presented here correspond to PCA output following an orthogonal rotation (varimax). The rotation increases the specificity of each component, leading to a simpler structure and easier interpretation of the results.