C Some basic algebra for linear models

If you want to understand linear models better, some linear algebra (a.k.a. matrix algebra) comes in handy. For linear algebra, you need to understand how to work with matrices: how to multiply them (calculating the product of two matrices), how to invert them (a kind of determining the reciprocal), and how to transpose them. We briefly explain those operations here, and then show how these tricks relate to ordinary least squares (OLS) estimation in practice. We also show how to perform linear algebra in R.

If you want to compute the product of matrix \(\mathbf{A}\) and matrix \(\mathbf{B}\), the order of the matrices is important. Suppose we have the matrix \(\mathbf{A}\)

\[ \mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\]

and matrix \(\mathbf{B}\)

\[ \mathbf{B} = \begin{bmatrix} e & f \\ g & h \end{bmatrix}\]

If you want to calculate the product \(\mathbf{A}\mathbf{B}\), you take each row of \(\mathbf{A}\) and you take the sum of the crossproducts with each column of \(\mathbf{B}\). That is, if the product of \(\mathbf{A}\mathbf{B} = \mathbf{C}\), then the first element of \(\mathbf{C}\), \(c_{11}\) (row 1, column 1) equals \(a\times e + b \times g\). In general, if \(\mathbf{C} = \mathbf{A}\mathbf{B}\), and \(c_{ij}\) is the element in the \(i\)-th row and the \(j\)-th column, we have

\[c_{ij} = A_{i1} B_{1j} + A_{i2} B_{2j} + \dots + A_{iJ} B_{Ij}\] where \(I\) is the number of rows in \(\mathbf{B}\) and \(J\) is the number of columns in \(\mathbf{A}\).

If we do that for each row of \(\mathbf{A}\) and each column of \(\mathbf{B}\), then we get matrix \(\mathbf{C}\):

\[ \mathbf{C} = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} e & f \\ g & h \end{bmatrix} = \begin{bmatrix} a e + b g & a f + b h \\ c e + d g & c f + dh \end{bmatrix}\]

Example C.1 (Matrix Multiplication) Let \[ \mathbf{A} = \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix}\]

and

\[ \mathbf{B} = \begin{bmatrix} 2 & 4 \\ 1 & 3 \end{bmatrix}\]

Then the product \(\mathbf{A}\mathbf{B}\) equals

\[ \mathbf{A}\mathbf{B} = \begin{bmatrix} 2 & 4 \\ 3 & 7 \end{bmatrix} \]

whereas the product \(\mathbf{B}\mathbf{A}\) equals

\[ \mathbf{B}\mathbf{A} = \begin{bmatrix} 6 & 4 \\ 4 & 3 \end{bmatrix} \] In R, multiplying matrices is done with the operator %*%:

A <- rbind(c(1, 0),
           c(1, 1))
B <- rbind(c(2, 4),
           c(1, 3))
A%*%B  # A x B
##      [,1] [,2]
## [1,]    2    4
## [2,]    3    7
B%*%A  # B x A
##      [,1] [,2]
## [1,]    6    4
## [2,]    4    3

Let \(\mathbf{y}\) be a vector containing the values of the dependent variable \(Y\). Let \(\mathbf{X}\) be the design matrix (see Chapter 10), where each row represents a unit of observation, and each column represents a numeric independent variable (remember that each categorical variable is always coded into one or more numeric variables, e.g. dummy variables).

When you do some algebra with matrix \(\mathbf{X}\) and vector \(\mathbf{y}\), you get the parameters (intercept and slopes) for a linear model:

\[ (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}^\intercal \mathbf{y}\]

where \(\mathbf{X}^\intercal\) is the transpose of \(\mathbf{X}\), which means the matrix on its side, where the first row becomes the first column, the second row becomes the second column, etcetera. For instance, suppose matrix \(\mathbf{X}\) is

\[ \mathbf{X} = \begin{bmatrix} 0 & 1 \\ 1 & 1 \\ 1& 1\\ 0 & 1 \end{bmatrix}\]

then the transpose of \(\mathbf{X}\) is \(\mathbf{X}^\intercal\)

\[ \mathbf{X^\intercal} = \begin{bmatrix} 0 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 \end{bmatrix}\]

In R, the transpose can be calculated using t():

M <- rbind(c(0, 1, 1, 0),
           c(1, 1, 1, 1))
M
##      [,1] [,2] [,3] [,4]
## [1,]    0    1    1    0
## [2,]    1    1    1    1
t(M)  # the transpose
##      [,1] [,2]
## [1,]    0    1
## [2,]    1    1
## [3,]    1    1
## [4,]    0    1

\(\mathbf{X}^{-1}\) is the inverse of matrix \(\mathbf{X}\). The inverse of a matrix \(\mathbf{X}\) is a matrix for which we know that \(\mathbf{X}\mathbf{X}^{-1}= \mathbf{I}\), where \(\mathbf{I}\) is the identity matrix.

\[ \mathbf{I} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\] that is, a matrix with all 0s except on the diagonal from top-left to bottom-right.

In R, the inverse of a matrix can be calculated using the ginv() function in the MASS package (Chapter 10). The function ginv() can be used for all matrices (it actually computes the generalised inverse). If the matrix has as many rows as it has columns, a matrix is called “square” and then the function solve() can also be used.

X = rbind(c(1, 1),  # a square matrix X
          c(0, 1))
X 
##      [,1] [,2]
## [1,]    1    1
## [2,]    0    1
solve(X) # the inverse of X
##      [,1] [,2]
## [1,]    1   -1
## [2,]    0    1
X %*% solve(X) # a product of a square matrix with its inverse is identity matrix
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

Example C.2 (Obtaining linear model parameters) Suppose we have the data set in Table C.1, where

Table C.1: Small data example.
y group
1 1
0 1
2 2
-4 2
-1 3
1 3

Let response vector \(\mathbf{y}\) contain the observed values of the dependent variable \(Y\):

\[ \mathbf{y} = \begin{bmatrix} 1 \\ 0 \\ 2 \\ -4 \\ -1 \\ 1 \end{bmatrix}\] Let design matrix \(\mathbf{X}\) contain the values for three independent variables (three columns). The first is by default a vector of 1s. The second and third variables are dummy variables coding for group. These new variables are given in Table C.2 to show how they relate to the original data.

Table C.2: Small data example with the columns of the design matrix.
y group intercept group2 group3
1 1 1 0 0
0 1 1 0 0
2 2 1 1 0
-4 2 1 1 0
-1 3 1 0 1
1 3 1 0 1

\[ \mathbf{X} = \begin{bmatrix} 1 &0 & 0\\ 1 &0 & 0\\ 1 &1 &0\\ 1 &1 &0\\ 1 &0 &1\\ 1& 0&1 \end{bmatrix}\]

Then the vector with the OLS parameter values (intercept and slopes) is obtained by calculating

\[ (\mathbf{X}^\intercal \mathbf{X})^{-1}\mathbf{X}^\intercal \mathbf{y} = \begin{bmatrix} 0.5 \\ -1.5 \\ -.5 \end{bmatrix}\]

Let R do the algebra for you:

y <- c(1, 0, 2, -4, -1, 1)
group2 = c(0, 0, 1, 1, 0, 0)
group3 = c(0, 0, 0, 0, 1, 1)
X <- cbind(intercept = 1, 
           group2, 
           group3)
X
##      intercept group2 group3
## [1,]         1      0      0
## [2,]         1      0      0
## [3,]         1      1      0
## [4,]         1      1      0
## [5,]         1      0      1
## [6,]         1      0      1
solve(t(X)%*%X) %*% t(X) %*% y
##           [,1]
## intercept  0.5
## group2    -1.5
## group3    -0.5

You obtain the same values with the function lm():

tibble(y = c(1, 0, 2, -4, -1, 1), # dependent variable
       group = factor(c(1, 1, 2, 2, 3, 3))) %>%  # categorical indep variable
  lm(y ~ group, data = .) # run a linear model with ordinary least squares
## 
## Call:
## lm(formula = y ~ group, data = .)
## 
## Coefficients:
## (Intercept)       group2       group3  
##         0.5         -1.5         -0.5

What the design matrix looks like can be obtained using model.matrix().

tibble(y = c(1, 0, 2, -4, -1, 1), # dependent variable
       group = factor(c(1, 1, 2, 2, 3, 3))) %>%  # categorical indep variable
  lm(y ~ group, data = .) %>% # run a linear model with ordinary least squares %>% 
  model.matrix() # show design matrix X
##   (Intercept) group2 group3
## 1           1      0      0
## 2           1      0      0
## 3           1      1      0
## 4           1      1      0
## 5           1      0      1
## 6           1      0      1
## attr(,"assign")
## [1] 0 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

You see the intercept variable and the two dummy variables that are created by default for the factor variable group.