5.2 Homework Problems (Spring 2020)
Exercise 5.12 (Homework 1, Problem 5) Consider the covariance matrix Σ=(11/21/41/211/21/41/21)
Find its three eigenvalues and eigenvectors. Make the eigenvectors have a squared norm of unity. Now identify the components in the decomposition PΣPT=D where P is an orthogonal matrix satisfying PPT=PTP=I, and D is a diagonal matrix.Proof. We first compute the eigenvalue of the matrix as |λI−Σ|=|λ−1−1/2−1/4−1/2λ−1−1/2−1/4−1/2λ−1|=(λ−34)(λ−9+√338)(λ−9−√338)
Hence, the three eigenvalues are 34, 9+√338 and 9−√338. For eigenvalue 34, solving system of linear functions (34I−Σ)x=0 we have the corresponding normalized eigenvector as (−√22,0,√22)T. Similarily, we can get the normailzed eigenvector corresponding to eigenvalue 9+√338 as (4√66−2√33,√33−1√66−2√33,4√66−2√33)T and corresponding to eigenvalue 9−√338 as (4√66+2√33,−√33+1√66+2√33,4√66+2√33)T.
Now, we just define matrix P and D as P=(−√220√224√66−2√33√33−1√66−2√334√66−2√334√66+2√33−√33+1√66+2√334√66+2√33)D=diag(34,9+√338,9−√338)
It can be easily verifyied that PΣPT=D is satisfied and PPT=PTP=I. Here we verified by using the following R code.
## [,1] [,2] [,3]
## [1,] 0.75 0.00000 0.0000000
## [2,] 0.00 1.84307 0.0000000
## [3,] 0.00 0.00000 0.4069297
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
Proof. Suppose the 2×2 matrix has two eigenvalues 1 and 0 corresponding to eigenvectors (1,0) and (0,1), respectively. Since from Problem 5, we know that PPT=1 and PΣPT=D, so PTDP=Σ. In this setting, we have P=(1001)D=(1000) Hence, Σ=(1000) is the covariance matrix we are looking for and it is obviously an legal covariance matrix.
Proof. Notice that \mathbf{Y}^TA\mathbf{Y} is just a scalar, we thus have \begin{equation} E(\mathbf{Y}^TA\mathbf{A})=E(tr(\mathbf{Y}^TA\mathbf{Y}))=E(tr(A\mathbf{Y}\mathbf{Y}^T)) \tag{5.63} \end{equation} Trace and expectartion both have linear property so we further have \begin{equation} E(tr(A\mathbf{Y}\mathbf{Y}^T))=tr(E(A\mathbf{Y}\mathbf{Y}^T))=tr(AE(\mathbf{Y}\mathbf{Y}^T)) \tag{5.64} \end{equation} Finally, since \mathbf{Y}\sim MVN(0,\Sigma), we have E(\mathbf{Y}\mathbf{Y}^T)=\Sigma. Hence \begin{equation} E(\mathbf{Y}^TA\mathbf{A})=tr(A\Sigma) \tag{5.65} \end{equation} as we desired.
As for \mathbf{Y}\sim MVN(\boldsymbol{\mu},\Sigma), since this time we have E(\mathbf{Y}\mathbf{Y}^T)=\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T, substitute back into (5.64) we have \begin{equation} \begin{split} E(\mathbf{Y}^TA\mathbf{A})&=tr(A(\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T))\\ &=tr(A\Sigma)+tr(\boldsymbol{\mu}^TA\boldsymbol{\mu})\\ &=tr(A\Sigma)+\boldsymbol{\mu}^TA\boldsymbol{\mu} \end{split} \tag{5.66} \end{equation} The last equation holds because \boldsymbol{\mu}^TA\boldsymbol{\mu} is just a scalar.
The trace of the scalar is itself and the exchange of trace and expectation is a classic trick for this kind of problems.Proof. Firstly, since we would like to prove M_2M_1=M_1, it is equivalent to prove that (I-M_2)M_1=0, which is also equivalent to show that Im(M_1)\subset Ker(I-M_2). Since M_2 is orthogonal projection matrix, we have Ker(I-M_2)=Im(M_2), so we are left with showing Im(M_1)\subset Im(M_2).
Since both M_1 and M_2 are n\times n matrix, their image are subspace of \mathbb{R}^n. Thus, in order to show Im(M_1)\subset Im(M_2), we only need to show Rank(M_1)\leq Rank(M_2).
From the property of projection matrix, if n\times r matrix X has Rank(X)=r, then the projection matrix X(X^TX)^{-1}X^T has rank r. Suppose matrix X_1 has rank Rank(X_1) and matrix X_2 has rank Rank(X_2). Since the second linear model contains all factors in the first model, we have Rank(X_1)\leq Rank(X_2). We have the inverse (X_1^TX_1)^{-1} and (X_2^TX_2)^{-1} exist, so both X_1 and X_2 are full rank. Hence their projection matrix has same rank as them, i.e. \begin{equation} Rank(M_1)=Rank(X_1)\leq Rank(X_2)=Rank(M_2) \tag{5.69} \end{equation} This is exactly what we need in the prove. Thus, we have the desired result M_2M_1=M_1.
Berger, James. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York City, NY: Springer Texts in Statistics.
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.
Christensen, Ronald. 2011. Plane Answers to Complex Questions: Theory of Linear Models. 4th ed. New York City, NY: Springer Texts in Statistics.
DeGroot, Morris, and Mark Schervish. 2012. Probability and Statistics. 4th ed. Boston, MA: Addison Wesley.
Gelman, Andrew, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2014. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: Chapman; Hall.
Graybill, Franklin. 2000. Theory and Application of the Linear Model. Pacific Grove, CA: Duxbury.
Kutner, Michael, Christopher Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models. 5th ed. New York, NY: McGraw-Hill Irwin.
Robert, Christian. 2007. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. 2nd ed. New York City, NY: Springer Texts in Statistics.
Searle, Shayle. 1997. Linear Models. New York, NY: Wiley.