3.4 Sums of squares

Let’s take a moment to talk about a useful concept that pops up pretty often: sums of squares.

Sums of squares are, well, just what they sound like: sums of squared things. But there are some particularly handy sums of squares to know about.

If you’re using the estimated coefficients \(b\), then this is a sum of squared residuals, not “true errors.” It is not unusual to refer to it as the “sum of squared errors” anyway. Such is life.

A sum of squares that you’ve seen before is the sum of squared residuals or errors, which is sometimes just called the SSE: \[ \sum_{i=1}^ne_i^2\] where \[ \begin{aligned} e_i &= y_i - \hat{y}_i \end{aligned} \] But we can also have sums of squares of other quantities. For example, in simple (one-predictor) linear regression, we can look at the sum of squares for x around its mean. This reflects the difference between each \(x_i\) and the average of all the predictor values: \[S_{xx} = \sum{(x_i - \bar{x})^2}\]

Does it bother you that these other sums of squares involve subtracting a mean, but the SSE doesn’t? Fear not! You can write the SSE as \(\sum{(e_i - \bar{e})^2}\), so it has the same form as \(S_{xx}\) and \(S_{yy}\). Why is this the same as \(\sum e_i\)? (Hint: what is \(\bar{e}\), the average of the residuals, equal to?)

We can do the same for the response values, the sum of squares for y: \[S_{yy} = \sum{(y_i - \bar{y})^2}\]

And we can also do a combination of the two, the sum of squares for x and y: \[S_{xy} = \sum{(x_i - \bar{x})(y_i - \bar{y})}\]

What I’m hoping you do here is look at the variance/covariance definitions and go “oh yeah, there is a sum-of-squarey thing in there!” Formally proving these statements involves some algebra that I’m not sufficiently interested in to print here. You can do it yourself if you like :)

Do these remind you of anything? Look back at the definitions for variance and covariance! Sums of squares are used in defining these concepts. Given a set of sample values for \(X\) and \(Y\), the sample variance is: \[Var(X) = \frac{S_{xx}}{n-1} \quad\quad Var(Y) = \frac{S_{yy}}{n-1}\] and while we’re at it, the sample covariance is: \[Cov(X,Y) = \frac{S_{xy}}{n-1}\] which incidentally means that the sample correlation can be written: \[r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}\]

Like variance and covariance, \(S_{xx}\), \(S_{yy}\), and \(S_{xy}\) reflect the spread of the values. The difference is that they don’t account for how many data points you’re looking at.

You can do lots of cute math with rewriting sums of squares. For example:

\[\begin{align*} S_{xx} & = \sum(x_i - \bar{x})^2 \\ & = \sum(x_i^2 - 2x_i\bar{x} + \bar{x}^2) \\ & = \sum x_i^2 - 2\sum(x_i \bar{x}) + \sum \bar{x}^2 \\ & = \sum x_i^2 - 2\bar{x}\sum x_i + n\bar{x}^2 \\ & = \sum x_i^2 - 2\bar{x}*(n\bar{x}) + n\bar{x}^2 \\ & = \sum x_i^2 - n\bar{x}^2 \end{align*}\]

We’ll see some more sums of squares later on. The important thing to remember about them is that they express spread – the way values vary around a mean.