5.2 Homework Problems (Spring 2020)
Exercise 5.12 (Homework 1, Problem 5) Consider the covariance matrix \[\begin{equation} \Sigma=\begin{pmatrix} &1 & 1/2 & 1/4 \\ &1/2 & 1 & 1/2\\ &1/4 & 1/2 & 1 \end{pmatrix} \tag{5.54} \end{equation}\]
Find its three eigenvalues and eigenvectors. Make the eigenvectors have a squared norm of unity. Now identify the components in the decomposition \[\begin{equation} P\Sigma P^T=D \tag{5.55} \end{equation}\] where \(P\) is an orthogonal matrix satisfying \(PP^T=P^TP=I\), and \(D\) is a diagonal matrix.Proof. We first compute the eigenvalue of the matrix as \[\begin{equation} \begin{split} |\lambda I-\Sigma|&=\begin{vmatrix} \lambda-1 & -1/2 & -1/4 \\ -1/2 & \lambda-1 & -1/2\\ -1/4 & -1/2 & \lambda-1\end{vmatrix}\\ &=(\lambda-\frac{3}{4})(\lambda-\frac{9+\sqrt{33}}{8})(\lambda-\frac{9-\sqrt{33}}{8}) \end{split} \tag{5.56} \end{equation}\]
Hence, the three eigenvalues are \(\frac{3}{4}\), \(\frac{9+\sqrt{33}}{8}\) and \(\frac{9-\sqrt{33}}{8}\). For eigenvalue \(\frac{3}{4}\), solving system of linear functions \((\frac{3}{4}\mathbf{I}-\Sigma)\mathbf{x}=\mathbf{0}\) we have the corresponding normalized eigenvector as \((-\frac{\sqrt{2}}{2},0,\frac{\sqrt{2}}{2})^T\). Similarily, we can get the normailzed eigenvector corresponding to eigenvalue \(\frac{9+\sqrt{33}}{8}\) as \((\frac{4}{\sqrt{66-2\sqrt{33}}},\frac{\sqrt{33}-1}{\sqrt{66-2\sqrt{33}}},\frac{4}{\sqrt{66-2\sqrt{33}}})^T\) and corresponding to eigenvalue \(\frac{9-\sqrt{33}}{8}\) as \((\frac{4}{\sqrt{66+2\sqrt{33}}},-\frac{\sqrt{33}+1}{\sqrt{66+2\sqrt{33}}},\frac{4}{\sqrt{66+2\sqrt{33}}})^T\).
Now, we just define matrix \(P\) and \(D\) as \[\begin{equation} \begin{split} &P=\begin{pmatrix} -\frac{\sqrt{2}}{2} & 0 & \frac{\sqrt{2}}{2}\\ \frac{4}{\sqrt{66-2\sqrt{33}}} & \frac{\sqrt{33}-1}{\sqrt{66-2\sqrt{33}}} & \frac{4}{\sqrt{66-2\sqrt{33}}}\\ \frac{4}{\sqrt{66+2\sqrt{33}}} & -\frac{\sqrt{33}+1}{\sqrt{66+2\sqrt{33}}}& \frac{4}{\sqrt{66+2\sqrt{33}}} \end{pmatrix}\\ &D=diag(\frac{3}{4},\frac{9+\sqrt{33}}{8},\frac{9-\sqrt{33}}{8}) \end{split} \tag{5.57} \end{equation}\]
It can be easily verifyied that \(P\Sigma P^T=D\) is satisfied and \(PP^T=P^TP=I\). Here we verified by using the following R code.
## [,1] [,2] [,3]
## [1,] 0.75 0.00000 0.0000000
## [2,] 0.00 1.84307 0.0000000
## [3,] 0.00 0.00000 0.4069297
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
Proof. Suppose the \(2\times2\) matrix has two eigenvalues \(1\) and \(0\) corresponding to eigenvectors \((1,0)\) and \((0,1)\), respectively. Since from Problem 5, we know that \(PP^T=1\) and \(P\Sigma P^T=D\), so \(P^TDP=\Sigma\). In this setting, we have \[\begin{equation} \begin{split} &P=\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\\ &D=\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \end{split} \tag{5.58} \end{equation}\] Hence, \(\Sigma=\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}\) is the covariance matrix we are looking for and it is obviously an legal covariance matrix.
Proof. Notice that \(\mathbf{Y}^TA\mathbf{Y}\) is just a scalar, we thus have \[\begin{equation} E(\mathbf{Y}^TA\mathbf{A})=E(tr(\mathbf{Y}^TA\mathbf{Y}))=E(tr(A\mathbf{Y}\mathbf{Y}^T)) \tag{5.63} \end{equation}\] Trace and expectartion both have linear property so we further have \[\begin{equation} E(tr(A\mathbf{Y}\mathbf{Y}^T))=tr(E(A\mathbf{Y}\mathbf{Y}^T))=tr(AE(\mathbf{Y}\mathbf{Y}^T)) \tag{5.64} \end{equation}\] Finally, since \(\mathbf{Y}\sim MVN(0,\Sigma)\), we have \(E(\mathbf{Y}\mathbf{Y}^T)=\Sigma\). Hence \[\begin{equation} E(\mathbf{Y}^TA\mathbf{A})=tr(A\Sigma) \tag{5.65} \end{equation}\] as we desired.
As for \(\mathbf{Y}\sim MVN(\boldsymbol{\mu},\Sigma)\), since this time we have \(E(\mathbf{Y}\mathbf{Y}^T)=\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T\), substitute back into (5.64) we have \[\begin{equation} \begin{split} E(\mathbf{Y}^TA\mathbf{A})&=tr(A(\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T))\\ &=tr(A\Sigma)+tr(\boldsymbol{\mu}^TA\boldsymbol{\mu})\\ &=tr(A\Sigma)+\boldsymbol{\mu}^TA\boldsymbol{\mu} \end{split} \tag{5.66} \end{equation}\] The last equation holds because \(\boldsymbol{\mu}^TA\boldsymbol{\mu}\) is just a scalar.
The trace of the scalar is itself and the exchange of trace and expectation is a classic trick for this kind of problems.Proof. Firstly, since we would like to prove \(M_2M_1=M_1\), it is equivalent to prove that \((I-M_2)M_1=0\), which is also equivalent to show that \(Im(M_1)\subset Ker(I-M_2)\). Since \(M_2\) is orthogonal projection matrix, we have \(Ker(I-M_2)=Im(M_2)\), so we are left with showing \(Im(M_1)\subset Im(M_2)\).
Since both \(M_1\) and \(M_2\) are \(n\times n\) matrix, their image are subspace of \(\mathbb{R}^n\). Thus, in order to show \(Im(M_1)\subset Im(M_2)\), we only need to show \(Rank(M_1)\leq Rank(M_2)\).
From the property of projection matrix, if \(n\times r\) matrix \(X\) has \(Rank(X)=r\), then the projection matrix \(X(X^TX)^{-1}X^T\) has rank r. Suppose matrix \(X_1\) has rank \(Rank(X_1)\) and matrix \(X_2\) has rank \(Rank(X_2)\). Since the second linear model contains all factors in the first model, we have \(Rank(X_1)\leq Rank(X_2)\). We have the inverse \((X_1^TX_1)^{-1}\) and \((X_2^TX_2)^{-1}\) exist, so both \(X_1\) and \(X_2\) are full rank. Hence their projection matrix has same rank as them, i.e. \[\begin{equation} Rank(M_1)=Rank(X_1)\leq Rank(X_2)=Rank(M_2) \tag{5.69} \end{equation}\] This is exactly what we need in the prove. Thus, we have the desired result \(M_2M_1=M_1\).
Berger, James. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York City, NY: Springer Texts in Statistics.
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.
Christensen, Ronald. 2011. Plane Answers to Complex Questions: Theory of Linear Models. 4th ed. New York City, NY: Springer Texts in Statistics.
DeGroot, Morris, and Mark Schervish. 2012. Probability and Statistics. 4th ed. Boston, MA: Addison Wesley.
Gelman, Andrew, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2014. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: Chapman; Hall.
Graybill, Franklin. 2000. Theory and Application of the Linear Model. Pacific Grove, CA: Duxbury.
Kutner, Michael, Christopher Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models. 5th ed. New York, NY: McGraw-Hill Irwin.
Robert, Christian. 2007. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. 2nd ed. New York City, NY: Springer Texts in Statistics.
Searle, Shayle. 1997. Linear Models. New York, NY: Wiley.