Examples

Protein in Pregnancy

Recall this data was collected through interest in whether the level of protein changes in expectant mothers throughout their pregnancy. Observations were taken on 19 healthy women. Each woman was at a different stage of pregnancy, gestation.

Suppose we already know that \(TSS = 0.8618\) and \(RSS = 0.2251\). This gives

\[\begin{aligned} R^2 &= 1-\frac{0.2251}{0.8618}\\ &= 0.7388 \end{aligned}\]

Therefore, 73.88% of the variability in the response of protein is explained by the fitted model with gestation. This indicates that the model describes the data quite well.

Giving in the Church of England

This data contains the amount of annual giving per church member in a sample of 20 dioceses in the Church of England. Three other potentially relevant factors are also recorded: employment rate, the percentage of the population on the electoral roll of the church, and the percentage of the population who usually attend church.

The aim of this study is to identify the factors that are associated with giving to the church.

We fit the model will all three predictors.

\[E(\mbox{giving}_i) = \alpha + \beta\mbox{employ}_i + \gamma\mbox{elect}_i + \delta\mbox{attend}_i\] where

  • \(i=1,\ldots,20\)

  • giving\(_i\) is the annual giving in church \(i\)

  • employ\(_i\) is the employment rate among those who attend church \(i\)

  • elect\(_i\) is the percentage of the population on the electoral roll associated with church \(i\)

  • attend\(_i\) is the attendance rate of church \(i\).

and find \(RSS = 791.87\) and \(TSS=1636\). Therefore

\[\begin{aligned} R^2 &= 1-\frac{791.87}{1636}\\ &= 0.516 \end{aligned}\]

Since we have three explanatory variables and 20 churches, \(k=3\) and \(n=20\)

\[\begin{aligned} R^2(adj) &= 1-(1-R^2)\frac{n-1}{n-p-1}\\ &= 1-(1-0.516)\frac{20-1}{20-3-1}\\ &= 0.425 \end{aligned}\]

Note the values of \(R^2\) is 0.516 and \(R^2 \mbox(adj)\) is 0.452 for \(p=3\)

Interpretation of \(R^2\) and \(R^2 \mbox(adj)\)

  • \(R^2\): 51.6% of the variability in the response of giving is explained by the fitted model. This model provides a reasonable fit to the data based on this metric.
  • When we adjust for there being more than one explanatory variable in the model, we would conclude that 42.52% of the variability is explained by the three predictors.

For any multiple regression model, \(R^2\) will either stay the same or increase with the addition of more explanatory variables, even if they do not have any relationship with the response. This is where \(R^2 \mbox(adj)\) helps, it penalizes you for adding varaibles that do not `improve’ your existing model.

Hence, if you are building a multiple linear regression, it is always suggested that you use \(R^2 \mbox(adj)\) over \(R^2\) to judge the goodness of fit of your model. In the case with one input variable, \(R^2\) and \(R^2 \mbox(adj)\) would be exactly same.

Typically, the more variables you add into the model with no relationship to the response variable, the greater the difference between R-squared and adjusted R-squared.