Portfolio Risk Measures - PART II

Case study

The goal of this mini project is to use the techniques of Normality Test and Principal Component Analysis we learned in the lecture to analyse Dow Jones 30 stock prices daily from 1991-01-02 to 2000-12-29. The dataset is provided by loading the libraries of QRM and qrmtools, which offer comprehensive functions for modelling and managing risk.

library("QRM")
library("qrmtools")

To complete the tasks of this session, you should also require the following libraries.

library("mvtnorm")  
library("xts")
library("tseries")

To access the data, alongside a list of constituent names, we conduct the following code:

data(DJ)        # Dow Jones 30 stock prices
names(DJ)       # List constituents

##  [1] "AA"   "AXP"  "T"    "BA"   "CAT"  "C"    "KO"   "DD"   "EK"   "XOM" 
## [11] "GE"   "GM"   "HWP"  "HD"   "HON"  "INTC" "IBM"  "IP"   "JPM"  "JNJ" 
## [21] "MCD"  "MRK"  "MSFT" "MMM"  "MO"   "PG"   "SBC"  "UTX"  "WMT"  "DIS"

You select 10 out of 30 companies from the list above to form your own portfolio.

The list below outlines the tasks that you need to complete.

Normality Tests:

Construct daily, weekly, monthly and quarterly returns of your selected stocks for the period 01/01/1993-31/12/2000.
Conduct Mardia’s tests of multivariate normality on the daily, weekly, monthly and annual returns, with the confidence level suggested to be 95%.
Conduct Jarque-Bera test of univariate normality on each column in the dataset of the periodic returns.

Principal Component Analysis:

Use the Dow Jones Index as the benchmark returns, i.e., $\mu$ , which can be accessed by data(dji) . Construct and calculate the daily log-returns of the index corresponding to the same observation window.
Apply the spectral decomposition to find the eigenvalues and eigenvectors of your selected stocks.
Calculate the principal components, compute the cumulative contribution to the total variance, and plot your results.

Reference codes

Ensure you clean your current workspace before running any R scripts for optimal performance.

rm(list=ls(all=TRUE))

Load libraries and data.

library("QRM")
library("qrmtools")
library("mvtnorm")  
library("xts")
library("tseries")
library("factoextra")  # Package for PCA visualization

data(DJ)     # Dow Jones 30 stock prices
names(DJ)    # List constituents

##  [1] "AA"   "AXP"  "T"    "BA"   "CAT"  "C"    "KO"   "DD"   "EK"   "XOM" 
## [11] "GE"   "GM"   "HWP"  "HD"   "HON"  "INTC" "IBM"  "IP"   "JPM"  "JNJ" 
## [21] "MCD"  "MRK"  "MSFT" "MMM"  "MO"   "PG"   "SBC"  "UTX"  "WMT"  "DIS"

Normality Tests.

Select 10 out of 30 companies and construct a time series of their stock prices.

stock.selection <- c("MO","KO","EK","HWP","INTC","MSFT","IBM","MCD","WMT","DIS")
prices <- DJ[,stock.selection]
head(prices, 3)

## GMT
##                 MO     KO      EK    HWP   INTC   MSFT     IBM    MCD    WMT
## 1991-01-02 10.5506 9.8760 23.0207 3.9108 1.1968 2.0764 27.8566 7.0782 7.0547
## 1991-01-03 10.2654 9.6032 22.9507 3.9108 1.1968 2.0903 27.9498 6.9851 7.0547
## 1991-01-04 10.2395 9.8487 22.7408 3.8492 1.2046 2.1076 27.8566 7.0161 6.9665
##               DIS
## 1991-01-02 7.8604
## 1991-01-03 7.8215
## 1991-01-04 7.7923

Calculate the log-prices/returns of your selected stocks.

logprices <- log(prices)
head(logprices, 3)

## GMT
##                  MO       KO       EK      HWP      INTC      MSFT     IBM
## 1991-01-02 2.356183 2.290108 3.136394 1.363742 0.1796513 0.7306356 3.32707
## 1991-01-03 2.328779 2.262096 3.133348 1.363742 0.1796513 0.7373076 3.33041
## 1991-01-04 2.326253 2.287339 3.124161 1.347865 0.1861476 0.7455499 3.32707
##                 MCD      WMT      DIS
## 1991-01-02 1.957020 1.953694 2.061837
## 1991-01-03 1.943779 1.953694 2.056876
## 1991-01-04 1.948208 1.941113 2.053136

X <- diff(logprices)
head(X, 3)

## GMT
##                      MO          KO           EK         HWP        INTC
## 1991-01-02           NA          NA           NA          NA          NA
## 1991-01-03 -0.027403713 -0.02801120 -0.003045374  0.00000000 0.000000000
## 1991-01-04 -0.002526227  0.02524309 -0.009187769 -0.01587662 0.006496233
##                   MSFT          IBM          MCD         WMT          DIS
## 1991-01-02          NA           NA           NA          NA           NA
## 1991-01-03 0.006671971  0.003340122 -0.013240329  0.00000000 -0.004961144
## 1991-01-04 0.008242263 -0.003340122  0.004428199 -0.01258111 -0.003740285

Generate daily, weekly, monthly and quarterly returns for the period 01/01/1993-31/12/2000.

# Set up the observation window.
start.date <- timeCalendar(d=1,m=1,y=1993)   
end.date <- timeCalendar(d=31,m=12,y=2000)

# Generate the daily returns. 
X.daily <- window(X,start.date,end.date)
head(X.daily, 3)

## GMT
##                      MO           KO         EK         HWP         INTC
## 1993-01-04 -0.009770992  0.002980470 0.00921909 -0.02536816 -0.005784864
## 1993-01-05 -0.038368795 -0.021052283 0.02417146  0.02715578  0.050720405
## 1993-01-06 -0.022357554  0.006060298 0.01481383  0.03852908  0.080455953
##                    MSFT         IBM          MCD          WMT          DIS
## 1993-01-04 -0.002927875 -0.00497376  0.005115859 -0.017731728 -0.005834979
## 1993-01-05  0.016016750 -0.02525502 -0.005115859 -0.003981952  0.007052471
## 1993-01-06  0.028501204 -0.01806417  0.005115859 -0.026290350  0.022984826

# Generate the weekly returns.  
by <- timeSequence(from = start.date,  to = end.date, by = "week")
X.weekly <- aggregate(X.daily, by, sum)
head(X.weekly, 3)

## GMT
##                      MO          KO         EK         HWP         INTC
## 1993-01-08 -0.041360853 -0.03958237 0.03039892 0.012446760  0.126662785
## 1993-01-15 -0.017034284  0.07188892 0.14728506 0.029586987  0.123669259
## 1993-01-22 -0.006895799 -0.05342234 0.01028333 0.006838848 -0.002247904
##                    MSFT         IBM         MCD         WMT          DIS
## 1993-01-08  0.020295283 -0.08004138  0.01526972 -0.07500427  0.004118734
## 1993-01-15  0.029690911  0.03694181  0.00000000  0.01876717  0.039778811
## 1993-01-22 -0.004198196  0.00774399 -0.02298876  0.03452342 -0.014025256

# Generate the monthly returns.  
by <- timeSequence(from = start.date,  to = end.date, by = "month")
X.monthly <- aggregate(X.daily, by, sum)
head(X.monthly, 3)

## GMT
##                      MO          KO         EK         HWP        INTC
## 1993-02-01 -0.016338434  0.02650974 0.20319343 0.043747849  0.24586009
## 1993-03-01 -0.150860571 -0.02055765 0.06607521 0.003419414  0.03531165
## 1993-04-01 -0.007403213  0.02153990 0.03738810 0.031914237 -0.03308760
##                   MSFT         IBM         MCD          WMT         DIS
## 1993-02-01  0.02460165  0.04369213 0.010205679  0.028880128  0.06047087
## 1993-03-01 -0.07103510  0.03960127 0.007582427  0.001894728 -0.04196360
## 1993-04-01  0.13198666 -0.05394390 0.075186498 -0.057542567  0.01699591

# Generate the quarterly returns.  
by <- timeSequence(from = start.date,  to = end.date, by = "quarter")
X.quarterly <- aggregate(X.daily,by,sum)
head(X.quarterly, 3)

## GMT
##                     MO           KO          EK         HWP       INTC
## 1993-04-01 -0.17460222  0.027491988  0.30665674  0.07908150  0.2480841
## 1993-07-01 -0.26863069  0.007105448 -0.06918209  0.05310853 -0.0318939
## 1993-10-01 -0.03677305 -0.019579060  0.17247651 -0.14659544  0.3014893
##                   MSFT        IBM         MCD         WMT         DIS
## 1993-04-01  0.08555320  0.0293495  0.09297460 -0.02676771  0.03550317
## 1993-07-01 -0.06525792 -0.0519166 -0.08785874 -0.18837533 -0.08661695
## 1993-10-01 -0.06062233 -0.1155608  0.06899506 -0.03836991 -0.07163943

Conduct Mardia’s test of multivariate normality on each set of the periodic returns.

Recall that Mardia’s test of multivariate normality relies on a hypothesis test on the multivariate measures of skewness (K3) and kurtosis (K4) with their corresponding asymptotic distributions:

$\frac{n}{6} K_3 \sim \chi^2_{d(d+1)(d+2)/6}, \qquad \frac{K_4-d(d+2)}{\sqrt{8d(d+2)/n}} \sim \mathcal{N}(0,1).$

Consider rejecting the null hypothesis that the periodic returns of the selected stocks follow some multivariate normal distribution in the event that the resulting p-value < 0.05.

MardiaTest(X.daily)

##         K3 K3 p-value         K4 K4 p-value 
##   10.86037    0.00000  263.76064    0.00000

MardiaTest(X.weekly)

##         K3 K3 p-value         K4 K4 p-value 
##   11.90346    0.00000  187.48002    0.00000

MardiaTest(X.monthly)

##           K3   K3 p-value           K4   K4 p-value 
## 2.484269e+01 6.166512e-12 1.471844e+02 0.000000e+00

MardiaTest(X.quarterly)

##          K3  K3 p-value          K4  K4 p-value 
##  46.1454125   0.1877432 117.3633980   0.3178233

Conduct Jarque-Bera test of univariate normality on each column in the periodic returns, with the test stats given by:

$T = \frac{n}{6}\left( K_3^2 + \frac{(K_4-3)^2}{4} \right) \sim \chi^2_2.$ When $T$ value is far from 0, it may lead to the rejection of normality.

As this test is univariate, we run it for each column in a dataset separately.

# To repeat as little as possible, we write a short function for this.
JBTest <- function(dataMatrix){
    ncol <- dim(dataMatrix)[2]
    pvals <- array(NA, ncol)
    for (i in 1:ncol) pvals[i] <- jarque.bera.test(dataMatrix[,i])$p.value
    if (!is.null(names(dataMatrix))) names(pvals) <- names(dataMatrix)
    return(pvals)
}

JBTest(X.daily)

##   MO   KO   EK  HWP INTC MSFT  IBM  MCD  WMT  DIS 
##    0    0    0    0    0    0    0    0    0    0

JBTest(X.weekly)

##           MO           KO           EK          HWP         INTC         MSFT 
## 0.000000e+00 1.627990e-08 0.000000e+00 3.734235e-12 0.000000e+00 0.000000e+00 
##          IBM          MCD          WMT          DIS 
## 2.114411e-04 3.195393e-03 1.968471e-03 1.037249e-08

JBTest(X.monthly)

##           MO           KO           EK          HWP         INTC         MSFT 
## 1.397525e-09 5.602451e-02 0.000000e+00 2.800529e-01 3.330669e-16 6.864336e-02 
##          IBM          MCD          WMT          DIS 
## 7.929228e-01 6.819415e-01 9.926543e-01 8.899040e-01

JBTest(X.quarterly)

##           MO           KO           EK          HWP         INTC         MSFT 
## 0.1074131679 0.0004216985 0.0919092699 0.9000812140 0.5390802680 0.5933648016 
##          IBM          MCD          WMT          DIS 
## 0.2917917380 0.6176245067 0.3374939877 0.0534219356

Principal Component Analysis.

To the selected 10 stocks, they have been chosen to be of two types:
- technology-related titles like Hewlett-Packard, Intel, Microsoft and IBM; and
- food/consumer-related titles like Philip Morris, Coca-Cola, Eastman Kodak, McDonald’s, Wal-Mart and Disney.

Prepare data for the benchmark means $\mu$ , the daily log-returns of the Dow Jones Index.

# Load the index level data.
data(dji)
# Calculate the index returns corresponding to the same window.
X.ind <- diff(log(dji))
X.ind <- window(X.ind,start.date,end.date)

Consider returns relative to the benchmark: subtract them from the stock returns.

# We repeat the index returns for each selected stocks.
X <- X.daily - X.ind[,rep(1,ncol(X.daily))]
head(X, 3)

## GMT
##                     MO            KO          EK         HWP         INTC
## 1993-01-04 -0.01222473  0.0005267335 0.006765354 -0.02782189 -0.008238601
## 1993-01-05 -0.03796076 -0.0206442482 0.024579490  0.02756381  0.051128440
## 1993-01-06 -0.02153796  0.0068798919 0.015633421  0.03934867  0.081275547
##                    MSFT          IBM          MCD          WMT          DIS
## 1993-01-04 -0.005381612 -0.007427496  0.002662123 -0.020185464 -0.008288715
## 1993-01-05  0.016424785 -0.024846983 -0.004707825 -0.003573918  0.007460505
## 1993-01-06  0.029320798 -0.017244577  0.005935453 -0.025470756  0.023804420

Apply spectral decomposition.

# Note: R computes the unbiased sample covariance.
S <- var(X)
# Get the eigenvalues/vectors.
tmp.eigen <- eigen(S)
names(tmp.eigen)

## [1] "values"  "vectors"

L <- diag(tmp.eigen$values) # eigenvalues
G <- tmp.eigen$vectors      # eigenvectors
# Check spectral decomposition (first 3 by 3 minor)
check <- G %*% L %*% t(G)
max(abs(S-check))           # the maximum discrepancy due to the rounding errors

## [1] 4.87891e-19

Calculate principal components.

Now, all the resources are available to conduct the PCA analysis. First, the princomp() computes the PCA, and summary() function shows the result.

out <- princomp(X)
summary(out)

## Importance of components:
##                            Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     0.03094445 0.0214188 0.02078997 0.01930529 0.01840345
## Proportion of Variance 0.24388101 0.1168431 0.11008301 0.09492164 0.08626035
## Cumulative Proportion  0.24388101 0.3607241 0.47080707 0.56572871 0.65198906
##                            Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     0.01794285 0.01737690 0.01692016 0.01593690 0.01422073
## Proportion of Variance 0.08199651 0.07690541 0.07291574 0.06468744 0.05150585
## Cumulative Proportion  0.73398557 0.81089098 0.88380672 0.94849415 1.00000000

# Extract the loadings
out$loadings

## 
## Loadings:
##      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## MO           0.192  0.916         0.249  0.156                0.146        
## KO    0.106                0.213         0.113 -0.122        -0.574 -0.758 
## EK                  0.183 -0.186 -0.804  0.277  0.418 -0.148 -0.106        
## HWP  -0.518  0.739 -0.122  0.258 -0.156  0.161 -0.215                      
## INTC -0.672 -0.445  0.138                              0.563               
## MSFT -0.431 -0.336  0.150  0.189               -0.186 -0.765 -0.143        
## IBM  -0.270  0.262        -0.376  0.424 -0.123  0.646 -0.175 -0.231 -0.119 
## MCD                        0.187         0.110         0.170 -0.711  0.638 
## WMT         -0.169 -0.191  0.688  0.150  0.355  0.510         0.226        
## DIS                 0.152  0.392 -0.241 -0.839  0.219                      
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings       1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0
## Proportion Var    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1
## Cumulative Var    0.1    0.2    0.3    0.4    0.5    0.6    0.7    0.8    0.9
##                Comp.10
## SS loadings        1.0
## Proportion Var     0.1
## Cumulative Var     1.0

Next we should check that the sum of variances of PCs equals the sum of variances of the original data.

PC.variances <- out$sdev^2
sum.PCvariances <- sum(PC.variances)
sum.PCvariances

## [1] 0.003926336

traceS <- sum(diag(S))
traceS

## [1] 0.003928281

# Cumulative contribution to the total variance
ratio <- cumsum(PC.variances)/sum(PC.variances)
ratio

##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6    Comp.7    Comp.8 
## 0.2438810 0.3607241 0.4708071 0.5657287 0.6519891 0.7339856 0.8108910 0.8838067 
##    Comp.9   Comp.10 
## 0.9484942 1.0000000

Visualization of the principal components.

To visualize the importance of each principal component and to determine the number of principal components to retain, the scree plot is employed by using the fviz_eig() function. This plot shows the eigenvalues in a downward curve, from highest to lowest.

To visualize the similarities and dissimilarities between the samples, and to show the impact of each attribute on each of the principal components, we can use the fviz_pca_var function.

# Plot eigenvalues. 
fviz_eig(out, addlabels = TRUE)

fviz_pca_var(out,
             col.var = "contrib", # Color by contributions to the PC
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE     # Avoid text overlapping
             )