Chapter 3 Relative importance

Relative importance analysis is a statistical technique used to determine the relative importance of predictor variables in a regression model. It involves comparing the predictive power of the full model with a series of reduced models, each missing one predictor variable. The reduction in predictive power when a particular variable is removed from the model can be used to quantify the relative importance of that variable in the model.

In a multiple regression model, multicollinearity can make it difficult to determine the unique contribution of each predictor variable. This method addresses this problem by calculating the “partial” or “unique” contribution of each predictor variable to the overall R-squared of the model. This is done by first calculating the R-squared value for each predictor variable when it is added to the model individually (i.e., with all other predictors held constant), and then comparing this to the R-squared value for the full model with all predictors included.

The contribution of each predictor variable to the overall R-squared of the model is then determined by taking the average difference between the R-squared value for the full model and the R-squared value for the model without that variable, for all possible combinations of full and reduced models containing the variable of interest. This average is divided by the R-squared value for the full model to give the proportion of variation explained uniquely by each predictor variable.

To understand better how the relative importance weights are calculated, we consider a set of three independent variables \(x_{1}\), \(x_{2}\), \(x_{3}\) and how much do they explain variation in dependent variable Y. To know the importance of a single variable \(x_{i}\) with 3 independent variables we have \(2^3-1\) models to estimate, listed below. (To estimate the relative importance of all n covariates we have to estimate \(n*2^n-n\) regressions. The number becomes large relatively fast as we add more explanatory variables; depending on the number of independent variables the computation time may become extensive).

N.Covariates Covariates R2
0 \(\emptyset\) 0
1 \(x_{1}\) \(R2_{x_{1}}\)
1 \(x_{2}\) \(R2_{x_{2}}\)
1 \(x_{3}\) \(R2_{x_{3}}\)
2 \(x_{1}\), \(x_{2}\) \(R2_{x_{1}x_{2}}\)
2 \(x_{1}\), \(x_{3}\) \(R2_{x_{1}x_{3}}\)
2 \(x_{2}\), \(x_{3}\) \(R2_{x_{2}x_{3}}\)
3 \(x_{1}\), \(x_{2}\), e \(x_{3}\) \(R2_{x_{1}x_{2}x_{3}}\)

Let’s calculate the total variance of \(Y\) explained by \(x_{1}\) by calculating the average difference in R2 with and without \(x_{1}\)

N.Covariates_full_model R2
1 \(\Delta_{0}= R2_{x_{1}}\)-0
2 \(\Delta_{1}=\)mean(\(R2_{x_{1}x_{2}}\)-\(R2_{x_{2}}\); \(R2_{x_{1}x_{3}}\)-\(R2_{x_{3}}\))
3 \(\Delta_{2}=R2_{x_{1}x_{2}x_{3}}\)-\(R2_{x_{2}x_{3}}\)

The contribution of \(x_{1}\) to the total R2 (let’s call it \(\tilde{x_{1}}\)) is finally calculated: \[\begin{equation} \tilde{x_{1}}=\text{mean(}\Delta_{0}; \Delta_{1}; \Delta_{2} \text{)} \end{equation}\]

Please note that in this way each contribution is weighted in such a way that the weight for adding \(x_{1}\) to a model with two predictors equates to the total weight for adding \(x_{1}\) to a model with one predictor. This is also the same as the weight for adding \(x_{1}\) to a model with no predictors. So in this case we have: \[\begin{equation} \tilde{x_{1}}=\frac{1}{3}(R2_{x_{1}x_{2}x_{3}}-R2_{x_{2}x_{3}})+\frac{1}{6}(R2_{x_{1}x_{2}}-R2_{x_{2}})+\frac{1}{6}(R2_{x_{1}x_{2}}-R2_{x_{2}})+\frac{1}{3}(R2_{x_{1}}-0) \end{equation}\]

And the relative importance \(rel({\tilde{x_{1}}})\) is computed as: \(rel(\tilde{x_{1}})=\frac{\tilde{x_{1}}}{R2}\), where R2 is the total variation of the dependant variable explained by the set of covariates {\(x_{1}\), \(x_{2}\), \(x_{3}\)}

We then continue to calculate the contribution of the remaining variables (\(x_{2}\) and \(x_{3}\)) in the same way. The sum will equal the total R2.

Consider the following example with educational data. We have a dataframe containing information about students, including their GPA in High School, GPA in the first year of University, and verbal and math SAT scores. Our objective is to identify the variable with the strongest predictive power in explaining the GPA in the first year of college. Let’s start by calculating how much variance is explained by the GPA in High School.

### load data 
load(paste0("C:\\Users\\roal2007\\Desktop\\learning\\Tools for Market Research\\FirstYearGPA.rda"))
### independent variables
vars <- c("SATV", "SATM", "HSGPA")
### dependent variable
dep_var <- "GPA"
## variable of interest 
xvar <- "HSGPA"

## we start by computing how much R2 "HSGPA" adds when it is added to "SATV" and "SATM"
## First step: Number of covariates=3 
## estimate full model (all covariates: "HSGPA", "SATV" and "SATM")
full_model <- lm(GPA ~ ., data = FirstYearGPA[c(dep_var,vars)])
R2_full <- summary(full_model)[["r.squared"]]

### subset covariates that do not contain xvar 
cov <- vars[vars!=xvar]
## estimate reduced model (all covariates but "HSGPA")
reduced_model <- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, cov)])
R2_reduced <- summary(reduced_model)[["r.squared"]]
## calculate how much additional R2 is added by including "HSGPA"
seq_r2_n3 <- R2_full-R2_reduced

## Second step Number of covariates=2 

## we have n choose k combination, in this case 3

comb_vars <- t(combn(vars, 2))

## we select only the combinations that have xvar (the variable of interest) in it 
# Select rows that contain xvar
xvar_rows <- apply(comb_vars, 1, function(x) any(grepl(xvar, x)))
# Subset the dataframe to include only rows that contain "xvar"
comb_vars <- comb_vars[xvar_rows, ]

## loop through each combination 
seq_r2 <- c()
for(i in 1:nrow(comb_vars)){
  ### set of covariates containing xvar 
  cov <- comb_vars[i,] 
  ### estimate the full model it contains two variable, one of which is xvar
  full_model <- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, cov)])
  R2_full <- summary(full_model)[["r.squared"]]
  
  ### subset covariates that do not contain xvar
  cov <- cov[cov!=xvar]
  ## estimate reduced model (exclude xvar)
  reduced_model <- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, cov)])
  R2_reduced <- summary(reduced_model)[["r.squared"]]
  ## store the additional R2 in seq_r2
  seq_r2_temp <- R2_full-R2_reduced
  seq_r2 <- c(seq_r2, seq_r2_temp)
}

seq_r2_n2 <- seq_r2


## Third step Number of covariates=1 
## full model contains only xvar as indipendent variable 
full_model <- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, xvar)])
R2_full <- summary(full_model)[["r.squared"]]
## in this case there's no reduced model
R2_reduced <- 0

seq_r2_n1 <- R2_full-R2_reduced


r2_n1 <- seq_r2_n1
r2_n2 <- mean(seq_r2_n2)
r2_n3 <-  seq_r2_n3
mean(c(r2_n1,r2_n2,r2_n3))
## [1] 0.1725661
N.Covariates_full_model R2 Result
1 \(\Delta_{0}= R2_{x_{1}}\)-0 0.1997083
2 \(\Delta_{1}=\)mean(\(R2_{x_{1}x_{2}}\)-\(R2_{x_{2}}\); \(R2_{x_{1}x_{3}}\)-\(R2_{x_{3}}\)) 0.1657953
3 \(\Delta_{2}=R2_{x_{1}x_{2}x_{3}}\)-\(R2_{x_{2}x_{3}}\) 0.1521946

Finally to compute the relative importance we calculate the mean of how much R2 is added in each step, the result in this example is: 0.17. To calculate the relative importance we divide it by the total R2 (how much variation is explained by including all independent variables).

R2tot <- summary(lm(GPA ~ ., data = FirstYearGPA[c(dep_var,vars)]))[["r.squared"]]

r2_xvar <- mean(c(r2_n1,r2_n2,r2_n3))

paste0(round(100*r2_xvar/R2tot,2), "%")
## [1] "70.04%"

We then proceed in computing the relative importance for each independent variable.

## let's create a function that we can call for each variable of interest

rel_weight <- function(xvar, yvar, vars, dt){
  
  seq_r2 <- c()
  for(i in length(vars):1){
    
    comb_vars <- t(combn(vars, i))

    if(ncol(comb_vars)>1){
      xvar_rows <- apply(comb_vars, 1, function(x) any(grepl(xvar, x)))
      comb_vars <- comb_vars[xvar_rows, ]
      Nloop <- ifelse(is.null(nrow(comb_vars))==T, 1, nrow(comb_vars))
      temp_r2 <- c()
    for(j in 1:Nloop){
      if(Nloop==1){
      cov <- comb_vars
      }else{
      cov <- comb_vars[j,]
      }
      full_model <- lm(reformulate(".", response = yvar), data = FirstYearGPA[c(yvar, cov)])
      R2_full <- summary(full_model)[["r.squared"]]
      
      cov <- cov[cov!=xvar]
      
      reduced_model <- lm(reformulate(".", response = yvar), data = FirstYearGPA[c(yvar, cov)])
      R2_reduced <- summary(reduced_model)[["r.squared"]]
      
      seq_r2_temp <- R2_full-R2_reduced
      temp_r2 <- c(temp_r2, seq_r2_temp)
    }
      
      seq_r2 <- c(seq_r2, mean(temp_r2))
      }else{
      full_model <- lm(reformulate(".", response = yvar), data = FirstYearGPA[c(yvar, xvar)])
      R2_full <- summary(full_model)[["r.squared"]]
      seq_r2_temp <- R2_full-0
      seq_r2 <- c(seq_r2, seq_r2_temp)
    }
    
    
  }
  
  return(seq_r2)
}

imp_GPA <- 100*mean(rel_weight("HSGPA", "GPA", vars, dt))/R2tot
imp_SATV <- 100*mean(rel_weight("SATV", "GPA", vars, dt))/R2tot
imp_SATM <- 100*mean(rel_weight("SATM", "GPA", vars, dt))/R2tot

And plot the results in descending order.

To test whether the relative importance scores of the covariates are statistically different from each other we can use bootstrap. Beware or running this analysis blindly, check the R2 to see how much variance of the dependent variables the covariates are able to capture.