A summary of the fitted model
The Simple Linear Regression
Suppose we have one response variable \(y\) and an explanatory variable \(x\) and two models as follows
Data: \((y_i,x_{i}),\quad i=1,\dots,n\)
Model 0: \(E(y_i) = \alpha\)
Model 1: \(E(y_i) = \alpha+\beta x_{i}\)
In the case of simple linear regression with only one explanatory variable, this compares a line that slopes through the data (Model 1) with a line that runs through the data but lies parallel to the horizontal axis (Model 0).
In order to fit Model 0 to the data, that is estimate the parameters in this model through least squares, we use
\[S(\alpha) = \sum_{i=1}^n(y_i-\alpha)^2\] \[\hat{\alpha} = \bar{y}.\]
as illustrated in the top left hand side plot below. Therefore, the residual sum-of-squares for Model 0 is: \[\begin{aligned} S(\hat{\alpha}) &= \sum_{i=1}^n(y_i-\bar{y})^2 \\ &= S_{yy} \\ \end{aligned}\] corresponding to the top right hand side plot below.
Denote the residual sum-of-squares for Model 1 as
\[\begin{aligned} S(\hat{\alpha}, \hat{\beta}) &= \sum_{i=1}^n(y_i-\{\hat{\alpha}+\hat{\beta} x_{i}\})^2 \\ &= \sum_{i=1}^n(y_i-\hat{y}_i)^2 \end{aligned}\]
Recall that we are calculating the distances between the observed values \(y_1,\ldots,y_n\) to fitted values \(\hat{y}_1,\ldots,\hat{y}_n\) corresponding to bottom left hand side plot above.
For completeness we can also look at the difference between fitted values obtained from Model 0 and those obtained from Model 1
\[\sum_{i=1}^{n}(\bar{y} - \hat{y}_i)^2\]
corresponding to bottom right hand side plot above.
Sums of Squares
The residual sum of squares of Model 0 is referred to as the Total corrected sum of squares \(TSS\) and the sum of squares between the fitted values obtained from Model 0 and Model 1 is referred to as the Model sum of squares \(MSS\). The three values \(RSS\), \(MSS\) and \(TSS\) are related such that \[TSS=MSS+RSS.\]
Coefficient of Determination \(R^2\)
In our discussion of least squares, the residual sum-of-squares for a particular model was proposed as a numerical measure of how well the model fits the data. This leads to a natural measure of how much variation in the data our model has explained, by comparing \(RSS\) with \(TSS\). A simple but useful measure of model fit is given by
\[R^2 = 1-\frac{RSS}{TSS}\]
where \(RSS\) is the residual sum-of-squares for Model 1, the fitted model of interest; and \(TSS = \sum_{i=1}^n(y_i-\bar{y})^2 = S_{yy}\), the residual sum of squares of the null model. Since Model 0 is more restricted, it will always produce a larger residual sum-of-squares. That is \(TSS > RSS\).
\(R^2\) quantifies how much of a drop in the residual sum-of-squares is accounted for by fitting the proposed model, and is often referred to as the coefficient of determination. This is expressed on a helpful scale, as a proportion of the total variation in the data.
Values of \(R^2\) approaching 1 indicate the model to be a good fit.
Values of \(R^2\) less than 0.5 suggest that the model gives an OK fit to the data.
Working with real data, we often observed very small \(R^2\) values less than 0.5.
In the case of simple linear regression
Model 1: \(E(y_i) = \alpha+\beta x_i\)
\[R^2=r^2\]
where \(R^2\) is the coefficient of determination and \(r\) is the sample correlation coefficient. To show this recall
\[RSS = \sum_{i=1}^n (y_i-(\hat{\alpha}+\hat{\beta} x_i))^2 = S_{yy}-\frac{(S_{xy})^2}{S_{xx}}\]
\[\begin{aligned} RSS &= \sum_{i=1}^n (y_i-\{\hat{\alpha}+\hat{\beta} x_i\})^2 \\ &= S_{yy}-\frac{(S_{xy})^2}{S_{xx}} \\ R^2 &= 1-\frac{RSS}{TSS}\\ &=1-\frac{\sum_{i=1}^n({y_i}-\hat{y}_i)^2}{\sum_{i=1}^n(y_i-\bar{y})^2}\\ &=\frac{S_{yy}-(S_{yy}-\frac{(S_{xy})^2}{S_{xx}})}{S_{yy}}\\ &=\frac{(S_{xy})^2}{S_{xx}S_{yy}} \\ &= r^2 \end{aligned}\]
Hence \(R^2 = r^2\) i.e. the coefficient of determination is the squared sample correlation; in the case of simple linear regression. This result does not extend to the multiple linear regression.
Nested Models
In the case of the simple linear regression
Model 0: \(E(y_i) = \alpha\)
Model 1: \(E(y_i) = \alpha+\beta x_{i}\)
these models are nested. By setting \(\beta=0\) in Model 1 we retrieve Model 0. In other words, the simpler Model 0 is a special case of the more complex Model 1.
In the case of simple linear regression through the origin
Model 0: \(E(y_i) = \alpha\)
Model 1: \(E(y_i) = \beta x_{i}\)
the formula for \(R^2\), with \(TSS = \sum_i (y_i-\bar{y})^2\), cannot be used. The fitted Model 1 and Model 0 are not nested.
The Multiple Linear Regression
Suppose now we have one response variable \(y\) and \(p\) explanatory variable \(x_1, \ldots x_p\) and two models as follows
Data: \((y_i,x_{1i}, \ldots, x_{(p-1)i}),\quad i=1,\dots,n\)
Model 0: \(E(y_i) = \alpha\)
Model 1: \(E(y_i) = \alpha+\beta_1 x_{1i} + \ldots + \beta_{k} x_{ki}\)
Then we can calculate the coefficient of determination \(R^2\) is the same way. However, in the case of multiple linear regression, where there is more than one explanatory variable in the model, we often refer to a quantity called adjusted \(R^2\), \(R^2\) (adj), instead of \(R^2\). As the number of explanatory variables increases, \(R^2\) also increases, but \(R^2\) (adj) adjusts for the fact that there is more than one explanatory variable the model.
\(R^2\)(adj) as a measure of model fit
For any multiple linear regression \(E(y_i) = \alpha+\beta_1x_{1i}+\dots+\beta_{(p-1)}x_{(p-1)i}\) the \(R^2\)(adj) is defined as \[R^2 \mbox(adj) = 1-\frac{\frac{RSS}{n-k-1}}{\frac{TSS}{n-1}},\] where \(k\) is the number of explanatory variables, i.e. the number of coefficients in the model excluding the constant \(\alpha\) term.
\(R^2\)(adj) can also be calculated from the following identity
\[R^{2} \mbox{(adj)} ={1-(1-R^{2}){n-1 \over n-k-1}}.\]