5.2 Homework Problems (Spring 2020)

Exercise 5.12 (Homework 1, Problem 5) Consider the covariance matrix Σ=(11/21/41/211/21/41/21)

Find its three eigenvalues and eigenvectors. Make the eigenvectors have a squared norm of unity. Now identify the components in the decomposition PΣPT=D where P is an orthogonal matrix satisfying PPT=PTP=I, and D is a diagonal matrix.

Proof. We first compute the eigenvalue of the matrix as |λIΣ|=|λ11/21/41/2λ11/21/41/2λ1|=(λ34)(λ9+338)(λ9338)

Hence, the three eigenvalues are 34, 9+338 and 9338. For eigenvalue 34, solving system of linear functions (34IΣ)x=0 we have the corresponding normalized eigenvector as (22,0,22)T. Similarily, we can get the normailzed eigenvector corresponding to eigenvalue 9+338 as (466233,33166233,466233)T and corresponding to eigenvalue 9338 as (466+233,33+166+233,466+233)T.

Now, we just define matrix P and D as P=(2202246623333166233466233466+23333+166+233466+233)D=diag(34,9+338,9338)

It can be easily verifyied that PΣPT=D is satisfied and PPT=PTP=I. Here we verified by using the following R code.

##      [,1]    [,2]      [,3]
## [1,] 0.75 0.00000 0.0000000
## [2,] 0.00 1.84307 0.0000000
## [3,] 0.00 0.00000 0.4069297
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
Exercise 5.13 (Homework 1, Problem 6) Give an example of a 2×2 covariance matrix of a random vector that has a zero eigenvalue.

Proof. Suppose the 2×2 matrix has two eigenvalues 1 and 0 corresponding to eigenvectors (1,0) and (0,1), respectively. Since from Problem 5, we know that PPT=1 and PΣPT=D, so PTDP=Σ. In this setting, we have P=(1001)D=(1000) Hence, Σ=(1000) is the covariance matrix we are looking for and it is obviously an legal covariance matrix.

Exercise 5.14 (Homework 1, Problem 8) Suppose that A is a m×n matrix and B is a n×m matrix. Show that tr(AB)=tr(BA)
Proof. Suppose m×n matrix A=(aij) and n×m matrix B=(bij), then we have tr(AB)=mi=1nj=1aijbji and tr(BA)=ni=1mj=1bijaji=b11a11+b12a21++b1mam1+b21a12+b22a22++b2mam2++bn1a1n+bn2a2n++bnmamn(sumbycolumn)=a11b11+a12b21++an1bn1+a21b12+a22b22++a2nbn2++am1b1m+am2b2m++amnbnm=mi=1nj=1aijbji Hence, we have tr(AB)=tr(BA) as we desired.
Exercise 5.15 (Homework 1, Problem 9) Suppose that YNn(0,Σ) and that A is a n×n matrix. Show that E(YTAY)=tr(AΣ) Repeat the problem if \mathbf{Y}\sim N_n(\boldsymbol{\mu},\Sigma).

Proof. Notice that \mathbf{Y}^TA\mathbf{Y} is just a scalar, we thus have \begin{equation} E(\mathbf{Y}^TA\mathbf{A})=E(tr(\mathbf{Y}^TA\mathbf{Y}))=E(tr(A\mathbf{Y}\mathbf{Y}^T)) \tag{5.63} \end{equation} Trace and expectartion both have linear property so we further have \begin{equation} E(tr(A\mathbf{Y}\mathbf{Y}^T))=tr(E(A\mathbf{Y}\mathbf{Y}^T))=tr(AE(\mathbf{Y}\mathbf{Y}^T)) \tag{5.64} \end{equation} Finally, since \mathbf{Y}\sim MVN(0,\Sigma), we have E(\mathbf{Y}\mathbf{Y}^T)=\Sigma. Hence \begin{equation} E(\mathbf{Y}^TA\mathbf{A})=tr(A\Sigma) \tag{5.65} \end{equation} as we desired.

As for \mathbf{Y}\sim MVN(\boldsymbol{\mu},\Sigma), since this time we have E(\mathbf{Y}\mathbf{Y}^T)=\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T, substitute back into (5.64) we have \begin{equation} \begin{split} E(\mathbf{Y}^TA\mathbf{A})&=tr(A(\Sigma+\boldsymbol{\mu}\boldsymbol{\mu}^T))\\ &=tr(A\Sigma)+tr(\boldsymbol{\mu}^TA\boldsymbol{\mu})\\ &=tr(A\Sigma)+\boldsymbol{\mu}^TA\boldsymbol{\mu} \end{split} \tag{5.66} \end{equation} The last equation holds because \boldsymbol{\mu}^TA\boldsymbol{\mu} is just a scalar.

The trace of the scalar is itself and the exchange of trace and expectation is a classic trick for this kind of problems.
Exercise 5.16 (Homework 1, Problem 10) Consider the two linear models \begin{equation} \mathbf{Y}=X_1\mathbf{\beta}_1+\boldsymbol{\epsilon},\quad \mathbf{Y}=X_2\mathbf{\beta}_2+\boldsymbol{\epsilon} \tag{5.67} \end{equation} Suppose that the second linear model contains all factors in the first model. Show that M_2M_1=M_1, where \begin{equation} M_1=X_1(X_1^TX_1)X_1^T,\quad M_2=X_2(X_2^TX_2)X_2^T \tag{5.68} \end{equation}

Proof. Firstly, since we would like to prove M_2M_1=M_1, it is equivalent to prove that (I-M_2)M_1=0, which is also equivalent to show that Im(M_1)\subset Ker(I-M_2). Since M_2 is orthogonal projection matrix, we have Ker(I-M_2)=Im(M_2), so we are left with showing Im(M_1)\subset Im(M_2).

Since both M_1 and M_2 are n\times n matrix, their image are subspace of \mathbb{R}^n. Thus, in order to show Im(M_1)\subset Im(M_2), we only need to show Rank(M_1)\leq Rank(M_2).

From the property of projection matrix, if n\times r matrix X has Rank(X)=r, then the projection matrix X(X^TX)^{-1}X^T has rank r. Suppose matrix X_1 has rank Rank(X_1) and matrix X_2 has rank Rank(X_2). Since the second linear model contains all factors in the first model, we have Rank(X_1)\leq Rank(X_2). We have the inverse (X_1^TX_1)^{-1} and (X_2^TX_2)^{-1} exist, so both X_1 and X_2 are full rank. Hence their projection matrix has same rank as them, i.e. \begin{equation} Rank(M_1)=Rank(X_1)\leq Rank(X_2)=Rank(M_2) \tag{5.69} \end{equation} This is exactly what we need in the prove. Thus, we have the desired result M_2M_1=M_1.

Berger, James. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York City, NY: Springer Texts in Statistics.

Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.

Christensen, Ronald. 2011. Plane Answers to Complex Questions: Theory of Linear Models. 4th ed. New York City, NY: Springer Texts in Statistics.

DeGroot, Morris, and Mark Schervish. 2012. Probability and Statistics. 4th ed. Boston, MA: Addison Wesley.

Gelman, Andrew, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2014. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: Chapman; Hall.

Graybill, Franklin. 2000. Theory and Application of the Linear Model. Pacific Grove, CA: Duxbury.

Kutner, Michael, Christopher Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models. 5th ed. New York, NY: McGraw-Hill Irwin.

Robert, Christian. 2007. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. 2nd ed. New York City, NY: Springer Texts in Statistics.

Searle, Shayle. 1997. Linear Models. New York, NY: Wiley.