Chapter 3 Relative importance
Relative importance analysis is a statistical technique used to determine the relative importance of predictor variables in a regression model. It involves comparing the predictive power of the full model with a series of reduced models, each missing one predictor variable. The reduction in predictive power when a particular variable is removed from the model can be used to quantify the relative importance of that variable in the model.
In a multiple regression model, multicollinearity can make it difficult to determine the unique contribution of each predictor variable. This method addresses this problem by calculating the “partial” or “unique” contribution of each predictor variable to the overall R-squared of the model. This is done by first calculating the R-squared value for each predictor variable when it is added to the model individually (i.e., with all other predictors held constant), and then comparing this to the R-squared value for the full model with all predictors included.
The contribution of each predictor variable to the overall R-squared of the model is then determined by taking the average difference between the R-squared value for the full model and the R-squared value for the model without that variable, for all possible combinations of full and reduced models containing the variable of interest. This average is divided by the R-squared value for the full model to give the proportion of variation explained uniquely by each predictor variable.
To understand better how the relative importance weights are calculated, we consider a set of three independent variables \(x_{1}\), \(x_{2}\), \(x_{3}\) and how much do they explain variation in dependent variable Y. To know the importance of a single variable \(x_{i}\) with 3 independent variables we have \(2^3-1\) models to estimate, listed below. (To estimate the relative importance of all n covariates we have to estimate \(n*2^n-n\) regressions. The number becomes large relatively fast as we add more explanatory variables; depending on the number of independent variables the computation time may become extensive).
N.Covariates | Covariates | R2 |
---|---|---|
0 | \(\emptyset\) | 0 |
1 | \(x_{1}\) | \(R2_{x_{1}}\) |
1 | \(x_{2}\) | \(R2_{x_{2}}\) |
1 | \(x_{3}\) | \(R2_{x_{3}}\) |
2 | \(x_{1}\), \(x_{2}\) | \(R2_{x_{1}x_{2}}\) |
2 | \(x_{1}\), \(x_{3}\) | \(R2_{x_{1}x_{3}}\) |
2 | \(x_{2}\), \(x_{3}\) | \(R2_{x_{2}x_{3}}\) |
3 | \(x_{1}\), \(x_{2}\), e \(x_{3}\) | \(R2_{x_{1}x_{2}x_{3}}\) |
Let’s calculate the total variance of \(Y\) explained by \(x_{1}\) by calculating the average difference in R2 with and without \(x_{1}\)
N.Covariates_full_model | R2 |
---|---|
1 | \(\Delta_{0}= R2_{x_{1}}\)-0 |
2 | \(\Delta_{1}=\)mean(\(R2_{x_{1}x_{2}}\)-\(R2_{x_{2}}\); \(R2_{x_{1}x_{3}}\)-\(R2_{x_{3}}\)) |
3 | \(\Delta_{2}=R2_{x_{1}x_{2}x_{3}}\)-\(R2_{x_{2}x_{3}}\) |
The contribution of \(x_{1}\) to the total R2 (let’s call it \(\tilde{x_{1}}\)) is finally calculated: \[\begin{equation} \tilde{x_{1}}=\text{mean(}\Delta_{0}; \Delta_{1}; \Delta_{2} \text{)} \end{equation}\]
Please note that in this way each contribution is weighted in such a way that the weight for adding \(x_{1}\) to a model with two predictors equates to the total weight for adding \(x_{1}\) to a model with one predictor. This is also the same as the weight for adding \(x_{1}\) to a model with no predictors. So in this case we have: \[\begin{equation} \tilde{x_{1}}=\frac{1}{3}(R2_{x_{1}x_{2}x_{3}}-R2_{x_{2}x_{3}})+\frac{1}{6}(R2_{x_{1}x_{2}}-R2_{x_{2}})+\frac{1}{6}(R2_{x_{1}x_{2}}-R2_{x_{2}})+\frac{1}{3}(R2_{x_{1}}-0) \end{equation}\]
And the relative importance \(rel({\tilde{x_{1}}})\) is computed as: \(rel(\tilde{x_{1}})=\frac{\tilde{x_{1}}}{R2}\), where R2 is the total variation of the dependant variable explained by the set of covariates {\(x_{1}\), \(x_{2}\), \(x_{3}\)}
We then continue to calculate the contribution of the remaining variables (\(x_{2}\) and \(x_{3}\)) in the same way. The sum will equal the total R2.
Consider the following example with educational data. We have a dataframe containing information about students, including their GPA in High School, GPA in the first year of University, and verbal and math SAT scores. Our objective is to identify the variable with the strongest predictive power in explaining the GPA in the first year of college. Let’s start by calculating how much variance is explained by the GPA in High School.
### load data
load(paste0("C:\\Users\\roal2007\\Desktop\\learning\\Tools for Market Research\\FirstYearGPA.rda"))
### independent variables
<- c("SATV", "SATM", "HSGPA")
vars ### dependent variable
<- "GPA"
dep_var ## variable of interest
<- "HSGPA"
xvar
## we start by computing how much R2 "HSGPA" adds when it is added to "SATV" and "SATM"
## First step: Number of covariates=3
## estimate full model (all covariates: "HSGPA", "SATV" and "SATM")
<- lm(GPA ~ ., data = FirstYearGPA[c(dep_var,vars)])
full_model <- summary(full_model)[["r.squared"]]
R2_full
### subset covariates that do not contain xvar
<- vars[vars!=xvar]
cov ## estimate reduced model (all covariates but "HSGPA")
<- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, cov)])
reduced_model <- summary(reduced_model)[["r.squared"]]
R2_reduced ## calculate how much additional R2 is added by including "HSGPA"
<- R2_full-R2_reduced
seq_r2_n3
## Second step Number of covariates=2
## we have n choose k combination, in this case 3
<- t(combn(vars, 2))
comb_vars
## we select only the combinations that have xvar (the variable of interest) in it
# Select rows that contain xvar
<- apply(comb_vars, 1, function(x) any(grepl(xvar, x)))
xvar_rows # Subset the dataframe to include only rows that contain "xvar"
<- comb_vars[xvar_rows, ]
comb_vars
## loop through each combination
<- c()
seq_r2 for(i in 1:nrow(comb_vars)){
### set of covariates containing xvar
<- comb_vars[i,]
cov ### estimate the full model it contains two variable, one of which is xvar
<- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, cov)])
full_model <- summary(full_model)[["r.squared"]]
R2_full
### subset covariates that do not contain xvar
<- cov[cov!=xvar]
cov ## estimate reduced model (exclude xvar)
<- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, cov)])
reduced_model <- summary(reduced_model)[["r.squared"]]
R2_reduced ## store the additional R2 in seq_r2
<- R2_full-R2_reduced
seq_r2_temp <- c(seq_r2, seq_r2_temp)
seq_r2
}
<- seq_r2
seq_r2_n2
## Third step Number of covariates=1
## full model contains only xvar as indipendent variable
<- lm(GPA ~ ., data = FirstYearGPA[c(dep_var, xvar)])
full_model <- summary(full_model)[["r.squared"]]
R2_full ## in this case there's no reduced model
<- 0
R2_reduced
<- R2_full-R2_reduced
seq_r2_n1
<- seq_r2_n1
r2_n1 <- mean(seq_r2_n2)
r2_n2 <- seq_r2_n3
r2_n3 mean(c(r2_n1,r2_n2,r2_n3))
## [1] 0.1725661
N.Covariates_full_model | R2 | Result |
---|---|---|
1 | \(\Delta_{0}= R2_{x_{1}}\)-0 | 0.1997083 |
2 | \(\Delta_{1}=\)mean(\(R2_{x_{1}x_{2}}\)-\(R2_{x_{2}}\); \(R2_{x_{1}x_{3}}\)-\(R2_{x_{3}}\)) | 0.1657953 |
3 | \(\Delta_{2}=R2_{x_{1}x_{2}x_{3}}\)-\(R2_{x_{2}x_{3}}\) | 0.1521946 |
Finally to compute the relative importance we calculate the mean of how much R2 is added in each step, the result in this example is: 0.17. To calculate the relative importance we divide it by the total R2 (how much variation is explained by including all independent variables).
<- summary(lm(GPA ~ ., data = FirstYearGPA[c(dep_var,vars)]))[["r.squared"]]
R2tot
<- mean(c(r2_n1,r2_n2,r2_n3))
r2_xvar
paste0(round(100*r2_xvar/R2tot,2), "%")
## [1] "70.04%"
We then proceed in computing the relative importance for each independent variable.
## let's create a function that we can call for each variable of interest
<- function(xvar, yvar, vars, dt){
rel_weight
<- c()
seq_r2 for(i in length(vars):1){
<- t(combn(vars, i))
comb_vars
if(ncol(comb_vars)>1){
<- apply(comb_vars, 1, function(x) any(grepl(xvar, x)))
xvar_rows <- comb_vars[xvar_rows, ]
comb_vars <- ifelse(is.null(nrow(comb_vars))==T, 1, nrow(comb_vars))
Nloop <- c()
temp_r2 for(j in 1:Nloop){
if(Nloop==1){
<- comb_vars
cov else{
}<- comb_vars[j,]
cov
}<- lm(reformulate(".", response = yvar), data = FirstYearGPA[c(yvar, cov)])
full_model <- summary(full_model)[["r.squared"]]
R2_full
<- cov[cov!=xvar]
cov
<- lm(reformulate(".", response = yvar), data = FirstYearGPA[c(yvar, cov)])
reduced_model <- summary(reduced_model)[["r.squared"]]
R2_reduced
<- R2_full-R2_reduced
seq_r2_temp <- c(temp_r2, seq_r2_temp)
temp_r2
}
<- c(seq_r2, mean(temp_r2))
seq_r2 else{
}<- lm(reformulate(".", response = yvar), data = FirstYearGPA[c(yvar, xvar)])
full_model <- summary(full_model)[["r.squared"]]
R2_full <- R2_full-0
seq_r2_temp <- c(seq_r2, seq_r2_temp)
seq_r2
}
}
return(seq_r2)
}
<- 100*mean(rel_weight("HSGPA", "GPA", vars, dt))/R2tot
imp_GPA <- 100*mean(rel_weight("SATV", "GPA", vars, dt))/R2tot
imp_SATV <- 100*mean(rel_weight("SATM", "GPA", vars, dt))/R2tot imp_SATM
And plot the results in descending order.
To test whether the relative importance scores of the covariates are statistically different from each other we can use bootstrap. Beware or running this analysis blindly, check the R2 to see how much variance of the dependent variables the covariates are able to capture.