5.2 Homework Problems (Spring 2020)

Exercise 5.12 (Homework 1, Problem 5) Consider the covariance matrix \[\begin{equation} \Sigma=\begin{pmatrix} &1 & 1/2 & 1/4 \\ &1/2 & 1 & 1/2\\ &1/4 & 1/2 & 1 \end{pmatrix} \tag{5.54} \end{equation}\]

Find its three eigenvalues and eigenvectors. Make the eigenvectors have a squared norm of unity. Now identify the components in the decomposition \[\begin{equation} P\Sigma P^T=D \tag{5.55} \end{equation}\] where \(P\) is an orthogonal matrix satisfying \(PP^T=P^TP=I\), and \(D\) is a diagonal matrix.

Proof. We first compute the eigenvalue of the matrix as \[\begin{equation} \begin{split} |\lambda I-\Sigma|&=\begin{vmatrix} \lambda-1 & -1/2 & -1/4 \\ -1/2 & \lambda-1 & -1/2\\ -1/4 & -1/2 & \lambda-1\end{vmatrix}\\ &=(\lambda-\frac{3}{4})(\lambda-\frac{9+\sqrt{33}}{8})(\lambda-\frac{9-\sqrt{33}}{8}) \end{split} \tag{5.56} \end{equation}\]

Hence, the three eigenvalues are \(\frac{3}{4}\), \(\frac{9+\sqrt{33}}{8}\) and \(\frac{9-\sqrt{33}}{8}\). For eigenvalue \(\frac{3}{4}\), solving system of linear functions \((\frac{3}{4}\mathbf{I}-\Sigma)\mathbf{x}=\mathbf{0}\) we have the corresponding normalized eigenvector as \((-\frac{\sqrt{2}}{2},0,\frac{\sqrt{2}}{2})^T\). Similarily, we can get the normailzed eigenvector corresponding to eigenvalue \(\frac{9+\sqrt{33}}{8}\) as \((\frac{4}{\sqrt{66-2\sqrt{33}}},\frac{\sqrt{33}-1}{\sqrt{66-2\sqrt{33}}},\frac{4}{\sqrt{66-2\sqrt{33}}})^T\) and corresponding to eigenvalue \(\frac{9-\sqrt{33}}{8}\) as \((\frac{4}{\sqrt{66+2\sqrt{33}}},-\frac{\sqrt{33}+1}{\sqrt{66+2\sqrt{33}}},\frac{4}{\sqrt{66+2\sqrt{33}}})^T\).

Now, we just define matrix \(P\) and \(D\) as \[\begin{equation} \begin{split} &P=\begin{pmatrix} -\frac{\sqrt{2}}{2} & 0 & \frac{\sqrt{2}}{2}\\ \frac{4}{\sqrt{66-2\sqrt{33}}} & \frac{\sqrt{33}-1}{\sqrt{66-2\sqrt{33}}} & \frac{4}{\sqrt{66-2\sqrt{33}}}\\ \frac{4}{\sqrt{66+2\sqrt{33}}} & -\frac{\sqrt{33}+1}{\sqrt{66+2\sqrt{33}}}& \frac{4}{\sqrt{66+2\sqrt{33}}} \end{pmatrix}\\ &D=diag(\frac{3}{4},\frac{9+\sqrt{33}}{8},\frac{9-\sqrt{33}}{8}) \end{split} \tag{5.57} \end{equation}\]

It can be easily verifyied that \(P\Sigma P^T=D\) is satisfied and \(PP^T=P^TP=I\). Here we verified by using the following R code.

##      [,1]    [,2]      [,3]
## [1,] 0.75 0.00000 0.0000000
## [2,] 0.00 1.84307 0.0000000
## [3,] 0.00 0.00000 0.4069297
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
Exercise 5.13 (Homework 1, Problem 6) Give an example of a \(2\times 2\) covariance matrix of a random vector that has a zero eigenvalue.

Proof. Suppose the \(2\times2\) matrix has two eigenvalues \(1\) and \(0\) corresponding to eigenvectors \((1,0)\) and \((0,1)\), respectively. Since from Problem 5, we know that \(PP^T=1\) and \(P\Sigma P^T=D\), so \(P^TDP=\Sigma\). In this setting, we have \[\begin{equation} \begin{split} &P=\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\\ &D=\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \end{split} \tag{5.58} \end{equation}\] Hence, \(\Sigma=\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}\) is the covariance matrix we are looking for and it is obviously an legal covariance matrix.

Exercise 5.14 (Homework 1, Problem 8) Suppose that \(A\) is a \(m\times n\) matrix and \(B\) is a \(n\times m\) matrix. Show that \[\begin{equation} tr(AB)=tr(BA) \tag{5.59} \end{equation}\]
Proof. Suppose \(m\times n\) matrix \(A=(a_{ij})\) and \(n\times m\) matrix \(B=(b_{ij})\), then we have \[\begin{equation} tr(AB)=\sum_{i=1}^m\sum_{j=1}^na_{ij}b_{ji} \tag{5.60} \end{equation}\] and \[\begin{equation} \begin{split} &tr(BA)=\sum_{i=1}^n\sum_{j=1}^mb_{ij}a_{ji}\\ &=b_{11}a_{11}+b_{12}a_{21}+\cdots+b_{1m}a_{m1}\\ &+b_{21}a_{12}+b_{22}a_{22}+\cdots+b_{2m}a_{m2}\\ &+\cdots\\ &+b_{n1}a_{1n}+b_{n2}a_{2n}+\cdots+b_{nm}a_{mn}\quad (sum\,by\,column)\\ &=a_{11}b_{11}+a_{12}b_{21}+\cdots+a_{n1}b_{n1}\\ &+a_{21}b_{12}+a_{22}b_{22}+\cdots+a_{2n}b_{n2}\\ &+\cdots\\ &+a_{m1}b_{1m}+a_{m2}b_{2m}+\cdots+a_{mn}b_{nm}\\ &=\sum_{i=1}^m\sum_{j=1}^na_{ij}b_{ji} \end{split} \tag{5.61} \end{equation}\] Hence, we have \(tr(AB)=tr(BA)\) as we desired.
Exercise 5.15 (Homework 1, Problem 9) Suppose that \(\mathbf{Y}\sim N_n(\mathbf{0},\Sigma)\) and that \(A\) is a \(n\times n\) matrix. Show that \[\begin{equation} E(\mathbf{Y}^TA\mathbf{Y})=tr(A\Sigma) \tag{5.62} \end{equation}\] Repeat the problem if \(\mathbf{Y}\sim N_n(\boldsymbol{\mu},\Sigma)\).

Proof. Notice that \(\mathbf{Y}^TA\mathbf{Y}\) is just a scalar, we thus have \[\begin{equation} E(\mathbf{Y}^TA\mathbf{A})=E(tr(\mathbf{Y}^TA\mathbf{Y}))=E(tr(A\mathbf{Y}\mathbf{Y}^T)) \tag{5.63} \end{equation}\] Trace and expectartion both have linear property so we further have \[\begin{equation} E(tr(A\mathbf{Y}\mathbf{Y}^T))=tr(E(A\mathbf{Y}\mathbf{Y}^T))=tr(AE(\mathbf{Y}\mathbf{Y}^T)) \tag{5.64} \end{equation}\] Finally, since \(\mathbf{Y}\sim MVN(0,\Sigma)\), we have \(E(\mathbf{Y}\mathbf{Y}^T)=\Sigma\). Hence \[\begin{equation} E(\mathbf{Y}^TA\mathbf{A})=tr(A\Sigma) \tag{5.65} \end{equation}\] as we desired.

As for \(\mathbf{Y}\sim MVN(\boldsymbol{\mu},\Sigma)\), since this time we have \(E(\mathbf{Y}\mathbf{Y}^T)=\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T\), substitute back into (5.64) we have \[\begin{equation} \begin{split} E(\mathbf{Y}^TA\mathbf{A})&=tr(A(\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T))\\ &=tr(A\Sigma)+tr(\boldsymbol{\mu}^TA\boldsymbol{\mu})\\ &=tr(A\Sigma)+\boldsymbol{\mu}^TA\boldsymbol{\mu} \end{split} \tag{5.66} \end{equation}\] The last equation holds because \(\boldsymbol{\mu}^TA\boldsymbol{\mu}\) is just a scalar.

The trace of the scalar is itself and the exchange of trace and expectation is a classic trick for this kind of problems.
Exercise 5.16 (Homework 1, Problem 10) Consider the two linear models \[\begin{equation} \mathbf{Y}=X_1\mathbf{\beta}_1+\boldsymbol{\epsilon},\quad \mathbf{Y}=X_2\mathbf{\beta}_2+\boldsymbol{\epsilon} \tag{5.67} \end{equation}\] Suppose that the second linear model contains all factors in the first model. Show that \(M_2M_1=M_1\), where \[\begin{equation} M_1=X_1(X_1^TX_1)X_1^T,\quad M_2=X_2(X_2^TX_2)X_2^T \tag{5.68} \end{equation}\]

Proof. Firstly, since we would like to prove \(M_2M_1=M_1\), it is equivalent to prove that \((I-M_2)M_1=0\), which is also equivalent to show that \(Im(M_1)\subset Ker(I-M_2)\). Since \(M_2\) is orthogonal projection matrix, we have \(Ker(I-M_2)=Im(M_2)\), so we are left with showing \(Im(M_1)\subset Im(M_2)\).

Since both \(M_1\) and \(M_2\) are \(n\times n\) matrix, their image are subspace of \(\mathbb{R}^n\). Thus, in order to show \(Im(M_1)\subset Im(M_2)\), we only need to show \(Rank(M_1)\leq Rank(M_2)\).

From the property of projection matrix, if \(n\times r\) matrix \(X\) has \(Rank(X)=r\), then the projection matrix \(X(X^TX)^{-1}X^T\) has rank r. Suppose matrix \(X_1\) has rank \(Rank(X_1)\) and matrix \(X_2\) has rank \(Rank(X_2)\). Since the second linear model contains all factors in the first model, we have \(Rank(X_1)\leq Rank(X_2)\). We have the inverse \((X_1^TX_1)^{-1}\) and \((X_2^TX_2)^{-1}\) exist, so both \(X_1\) and \(X_2\) are full rank. Hence their projection matrix has same rank as them, i.e. \[\begin{equation} Rank(M_1)=Rank(X_1)\leq Rank(X_2)=Rank(M_2) \tag{5.69} \end{equation}\] This is exactly what we need in the prove. Thus, we have the desired result \(M_2M_1=M_1\).

Berger, James. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York City, NY: Springer Texts in Statistics.

Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.

Christensen, Ronald. 2011. Plane Answers to Complex Questions: Theory of Linear Models. 4th ed. New York City, NY: Springer Texts in Statistics.

DeGroot, Morris, and Mark Schervish. 2012. Probability and Statistics. 4th ed. Boston, MA: Addison Wesley.

Gelman, Andrew, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2014. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: Chapman; Hall.

Graybill, Franklin. 2000. Theory and Application of the Linear Model. Pacific Grove, CA: Duxbury.

Kutner, Michael, Christopher Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models. 5th ed. New York, NY: McGraw-Hill Irwin.

Robert, Christian. 2007. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. 2nd ed. New York City, NY: Springer Texts in Statistics.

Searle, Shayle. 1997. Linear Models. New York, NY: Wiley.