5.2 Homework Problems (Spring 2020)

Exercise 5.12 (Homework 1, Problem 5) Consider the covariance matrix $\begin{equation} \Sigma=\begin{pmatrix} &1 & 1/2 & 1/4 \\ &1/2 & 1 & 1/2\\ &1/4 & 1/2 & 1 \end{pmatrix} \tag{5.54} \end{equation}$

Find its three eigenvalues and eigenvectors. Make the eigenvectors have a squared norm of unity. Now identify the components in the decomposition

$\begin{equation} P\Sigma P^T=D \tag{5.55} \end{equation}$ where

$P$ is an orthogonal matrix satisfying

$PP^T=P^TP=I$ , and

$D$ is a diagonal matrix.

Proof. We first compute the eigenvalue of the matrix as $\begin{equation} \begin{split} |\lambda I-\Sigma|&=\begin{vmatrix} \lambda-1 & -1/2 & -1/4 \\ -1/2 & \lambda-1 & -1/2\\ -1/4 & -1/2 & \lambda-1\end{vmatrix}\\ &=(\lambda-\frac{3}{4})(\lambda-\frac{9+\sqrt{33}}{8})(\lambda-\frac{9-\sqrt{33}}{8}) \end{split} \tag{5.56} \end{equation}$

Hence, the three eigenvalues are $\frac{3}{4}$ , $\frac{9+\sqrt{33}}{8}$ and $\frac{9-\sqrt{33}}{8}$ . For eigenvalue $\frac{3}{4}$ , solving system of linear functions $(\frac{3}{4}\mathbf{I}-\Sigma)\mathbf{x}=\mathbf{0}$ we have the corresponding normalized eigenvector as $(-\frac{\sqrt{2}}{2},0,\frac{\sqrt{2}}{2})^T$ . Similarily, we can get the normailzed eigenvector corresponding to eigenvalue $\frac{9+\sqrt{33}}{8}$ as $(\frac{4}{\sqrt{66-2\sqrt{33}}},\frac{\sqrt{33}-1}{\sqrt{66-2\sqrt{33}}},\frac{4}{\sqrt{66-2\sqrt{33}}})^T$ and corresponding to eigenvalue $\frac{9-\sqrt{33}}{8}$ as $(\frac{4}{\sqrt{66+2\sqrt{33}}},-\frac{\sqrt{33}+1}{\sqrt{66+2\sqrt{33}}},\frac{4}{\sqrt{66+2\sqrt{33}}})^T$ .

Now, we just define matrix $P$ and $D$ as $\begin{equation} \begin{split} &P=\begin{pmatrix} -\frac{\sqrt{2}}{2} & 0 & \frac{\sqrt{2}}{2}\\ \frac{4}{\sqrt{66-2\sqrt{33}}} & \frac{\sqrt{33}-1}{\sqrt{66-2\sqrt{33}}} & \frac{4}{\sqrt{66-2\sqrt{33}}}\\ \frac{4}{\sqrt{66+2\sqrt{33}}} & -\frac{\sqrt{33}+1}{\sqrt{66+2\sqrt{33}}}& \frac{4}{\sqrt{66+2\sqrt{33}}} \end{pmatrix}\\ &D=diag(\frac{3}{4},\frac{9+\sqrt{33}}{8},\frac{9-\sqrt{33}}{8}) \end{split} \tag{5.57} \end{equation}$

It can be easily verifyied that $P\Sigma P^T=D$ is satisfied and $PP^T=P^TP=I$ . Here we verified by using the following R code.

round(t(P)%*%Sigma%*%P,10)

##      [,1]    [,2]      [,3]
## [1,] 0.75 0.00000 0.0000000
## [2,] 0.00 1.84307 0.0000000
## [3,] 0.00 0.00000 0.4069297

round(P%*%t(P),10)

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

round(t(P)%*%P,10)

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

Exercise 5.13 (Homework 1, Problem 6) Give an example of a

$2\times 2$ covariance matrix of a random vector that has a zero eigenvalue.

Proof. Suppose the $2\times2$ matrix has two eigenvalues $1$ and $0$ corresponding to eigenvectors $(1,0)$ and $(0,1)$ , respectively. Since from Problem 5, we know that $PP^T=1$ and $P\Sigma P^T=D$ , so $P^TDP=\Sigma$ . In this setting, we have $\begin{equation} \begin{split} &P=\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\\ &D=\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \end{split} \tag{5.58} \end{equation}$ Hence, $\Sigma=\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$ is the covariance matrix we are looking for and it is obviously an legal covariance matrix.

Exercise 5.14 (Homework 1, Problem 8) Suppose that

$A$ is a

$m\times n$ matrix and

$B$ is a

$n\times m$ matrix. Show that

$\begin{equation} tr(AB)=tr(BA) \tag{5.59} \end{equation}$

Proof. Suppose

$m\times n$ matrix

$A=(a_{ij})$ and

$n\times m$ matrix

$B=(b_{ij})$ , then we have

$\begin{equation} tr(AB)=\sum_{i=1}^m\sum_{j=1}^na_{ij}b_{ji} \tag{5.60} \end{equation}$ and

$\begin{equation} \begin{split} &tr(BA)=\sum_{i=1}^n\sum_{j=1}^mb_{ij}a_{ji}\\ &=b_{11}a_{11}+b_{12}a_{21}+\cdots+b_{1m}a_{m1}\\ &+b_{21}a_{12}+b_{22}a_{22}+\cdots+b_{2m}a_{m2}\\ &+\cdots\\ &+b_{n1}a_{1n}+b_{n2}a_{2n}+\cdots+b_{nm}a_{mn}\quad (sum\,by\,column)\\ &=a_{11}b_{11}+a_{12}b_{21}+\cdots+a_{n1}b_{n1}\\ &+a_{21}b_{12}+a_{22}b_{22}+\cdots+a_{2n}b_{n2}\\ &+\cdots\\ &+a_{m1}b_{1m}+a_{m2}b_{2m}+\cdots+a_{mn}b_{nm}\\ &=\sum_{i=1}^m\sum_{j=1}^na_{ij}b_{ji} \end{split} \tag{5.61} \end{equation}$ Hence, we have

$tr(AB)=tr(BA)$ as we desired.

Exercise 5.15 (Homework 1, Problem 9) Suppose that

$\mathbf{Y}\sim N_n(\mathbf{0},\Sigma)$ and that

$A$ is a

$n\times n$ matrix. Show that

$\begin{equation} E(\mathbf{Y}^TA\mathbf{Y})=tr(A\Sigma) \tag{5.62} \end{equation}$ Repeat the problem if

$\mathbf{Y}\sim N_n(\boldsymbol{\mu},\Sigma)$ .

Proof. Notice that $\mathbf{Y}^TA\mathbf{Y}$ is just a scalar, we thus have $\begin{equation} E(\mathbf{Y}^TA\mathbf{A})=E(tr(\mathbf{Y}^TA\mathbf{Y}))=E(tr(A\mathbf{Y}\mathbf{Y}^T)) \tag{5.63} \end{equation}$ Trace and expectartion both have linear property so we further have $\begin{equation} E(tr(A\mathbf{Y}\mathbf{Y}^T))=tr(E(A\mathbf{Y}\mathbf{Y}^T))=tr(AE(\mathbf{Y}\mathbf{Y}^T)) \tag{5.64} \end{equation}$ Finally, since $\mathbf{Y}\sim MVN(0,\Sigma)$ , we have $E(\mathbf{Y}\mathbf{Y}^T)=\Sigma$ . Hence $\begin{equation} E(\mathbf{Y}^TA\mathbf{A})=tr(A\Sigma) \tag{5.65} \end{equation}$ as we desired.

As for $\mathbf{Y}\sim MVN(\boldsymbol{\mu},\Sigma)$ , since this time we have $E(\mathbf{Y}\mathbf{Y}^T)=\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T$ , substitute back into (5.64) we have $\begin{equation} \begin{split} E(\mathbf{Y}^TA\mathbf{A})&=tr(A(\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T))\\ &=tr(A\Sigma)+tr(\boldsymbol{\mu}^TA\boldsymbol{\mu})\\ &=tr(A\Sigma)+\boldsymbol{\mu}^TA\boldsymbol{\mu} \end{split} \tag{5.66} \end{equation}$ The last equation holds because $\boldsymbol{\mu}^TA\boldsymbol{\mu}$ is just a scalar.

The trace of the scalar is itself and the exchange of trace and expectation is a classic trick for this kind of problems.

Exercise 5.16 (Homework 1, Problem 10) Consider the two linear models

$\begin{equation} \mathbf{Y}=X_1\mathbf{\beta}_1+\boldsymbol{\epsilon},\quad \mathbf{Y}=X_2\mathbf{\beta}_2+\boldsymbol{\epsilon} \tag{5.67} \end{equation}$ Suppose that the second linear model contains all factors in the first model. Show that

$M_2M_1=M_1$ , where

$\begin{equation} M_1=X_1(X_1^TX_1)X_1^T,\quad M_2=X_2(X_2^TX_2)X_2^T \tag{5.68} \end{equation}$

Proof. Firstly, since we would like to prove $M_2M_1=M_1$ , it is equivalent to prove that $(I-M_2)M_1=0$ , which is also equivalent to show that $Im(M_1)\subset Ker(I-M_2)$ . Since $M_2$ is orthogonal projection matrix, we have $Ker(I-M_2)=Im(M_2)$ , so we are left with showing $Im(M_1)\subset Im(M_2)$ .

Since both $M_1$ and $M_2$ are $n\times n$ matrix, their image are subspace of $\mathbb{R}^n$ . Thus, in order to show $Im(M_1)\subset Im(M_2)$ , we only need to show $Rank(M_1)\leq Rank(M_2)$ .

From the property of projection matrix, if $n\times r$ matrix $X$ has $Rank(X)=r$ , then the projection matrix $X(X^TX)^{-1}X^T$ has rank r. Suppose matrix $X_1$ has rank $Rank(X_1)$ and matrix $X_2$ has rank $Rank(X_2)$ . Since the second linear model contains all factors in the first model, we have $Rank(X_1)\leq Rank(X_2)$ . We have the inverse $(X_1^TX_1)^{-1}$ and $(X_2^TX_2)^{-1}$ exist, so both $X_1$ and $X_2$ are full rank. Hence their projection matrix has same rank as them, i.e. $\begin{equation} Rank(M_1)=Rank(X_1)\leq Rank(X_2)=Rank(M_2) \tag{5.69} \end{equation}$ This is exactly what we need in the prove. Thus, we have the desired result $M_2M_1=M_1$ .

Berger, James. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York City, NY: Springer Texts in Statistics.

Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.

Christensen, Ronald. 2011. Plane Answers to Complex Questions: Theory of Linear Models. 4th ed. New York City, NY: Springer Texts in Statistics.

DeGroot, Morris, and Mark Schervish. 2012. Probability and Statistics. 4th ed. Boston, MA: Addison Wesley.

Gelman, Andrew, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2014. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: Chapman; Hall.

Graybill, Franklin. 2000. Theory and Application of the Linear Model. Pacific Grove, CA: Duxbury.

Kutner, Michael, Christopher Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models. 5th ed. New York, NY: McGraw-Hill Irwin.

Robert, Christian. 2007. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. 2nd ed. New York City, NY: Springer Texts in Statistics.

Searle, Shayle. 1997. Linear Models. New York, NY: Wiley.