# 2 Financial data and portfolio allocation

## 2.1 Download and work with stock market data

The main aim of this exercise is to familiarize yourself with the `tidyverse`

. To download price data you can make use of the convenient `tidyquant`

package. If you have trouble using `tidyquant`

, check out the documentation. In case you did not install *R* and *RStudio* yet, you should follow the instructions provided on Absalon. Start the session by loading the `tidyverse`

package as shown below.

```
# install.packages("tidyverse")
# install.packages("tidyquant")
library(tidyverse)
library(tidyquant)
```

### 2.1.1 Exercises

- Download daily prices for one stock market ticker of your choice (e.g.
*AAPL*) from Yahoo!Finance. Plot the time series of adjusted closing prices. To download the data you can use the command`tq_get`

. If you do not know how to use it, make sure you read the help file by calling`?tq_get`

. I especially recommended to take a look in the examples section in the documentation. - Compute daily returns for the asset and visualize the distribution of daily returns in a histogram. Also, use
`geom_vline()`

to add a dashed red line that indicates the 5% quantile of the daily returns within the histogram. - Compute summary statistics (mean, standard deviation, minimum and maximum) for daily returns
- Now the tidyverse magic starts: Take your code from before and generalize it such all the computations are performed for an arbitrary vector of tickers (e.g.,
`ticker <- c("AAPL", "MMM", "BA")`

). Automate the download, the plot of the price time series and the table of summary statistics for this arbitrary number of assets. - Are days with high aggregate trading volume often followed by high aggregate trading volume days? Compute the aggregate daily trading volume (in USD) and find an appropriate visualization to analyze the question.

### 2.1.2 Solutions

```
<- tq_get("AAPL", get = "stock.prices")
prices %>% head() # How does the data look like? prices
```

```
## # A tibble: 6 x 8
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2011-01-03 11.6 11.8 11.6 11.8 445138400 10.1
## 2 AAPL 2011-01-04 11.9 11.9 11.7 11.8 309080800 10.2
## 3 AAPL 2011-01-05 11.8 11.9 11.8 11.9 255519600 10.3
## 4 AAPL 2011-01-06 12.0 12.0 11.9 11.9 300428800 10.2
## 5 AAPL 2011-01-07 11.9 12.0 11.9 12.0 311931200 10.3
## 6 AAPL 2011-01-10 12.1 12.3 12.0 12.2 448560000 10.5
```

`tq_get`

downloads stock market data from Yahoo!Finance if you do not specify another data source. The function returns a tibble with 8 self-explaining columns, *symbol*, *date*, the market prices at the *open, high, low* and *close*, the daily *volume* (in number of shares) and the *adjusted* price in USD which factors in anything that might affect the stock price after the market closes, e.g. stock splits, repurchases and dividends.

```
%>% # Simple visualization of the downloaded price time series
prices ggplot(aes(x = date, y = adjusted)) +
geom_line() +
labs(x = "Date", y = "Price") +
scale_y_log10() # Change the y axis-scale
```

Figure 2.1 illustrates the time series of downloaded *adjusted* prices. Make sure you understand every single line of code! (What is the purpose of `%>%`

? What are the arguments of `aes()`

? Which alternative *geoms* could you use to visualize the time series? Hint: if you do not know the answers try to change the code to see what difference your intervention causes).

Next, we compute daily (log)-returns defined as \(\log(p_t) - \log(p_{t-1})\) where \(p_t\) is the adjusted day \(t\) price.

```
<- prices %>%
returns mutate(log_price = log(adjusted),
return = log_price - lag(log_price)) %>% # Compute log returns
select(symbol, date, return)
%>% head() returns
```

```
## # A tibble: 6 x 3
## symbol date return
## <chr> <date> <dbl>
## 1 AAPL 2011-01-03 NA
## 2 AAPL 2011-01-04 0.00521
## 3 AAPL 2011-01-05 0.00815
## 4 AAPL 2011-01-06 -0.000809
## 5 AAPL 2011-01-07 0.00714
## 6 AAPL 2011-01-10 0.0187
```

The resulting tibble contains three columns, the last is the one with the daily log returns. Note that the first entry naturally contains the entry `NA`

because there is no leading price. Note also, that the computations require that the time series is ordered by date - otherwise, `lag`

would be meaningless.
For the upcoming exercises we remove missing values as these would require careful treatment when computing, e.g., sample averages. In general, however, make sure you understand why `NA`

values occur and if you can simply get rid of these observations.

`<- returns %>% drop_na() returns `

```
<- quantile(returns %>% pull(return), 0.05) # Compute the 5 % quantile of the returns
q05
%>% # create a histogram for daily returns
returns ggplot(aes(x = return)) +
geom_histogram(bins = 100) +
labs(x = "Return", y = "") +
geom_vline(aes(xintercept = q05),
color = "red",
linetype = "dashed")
```

Here, `bins = 100`

determines the width of the bins in the illustration. Also, make sure you understand how to use the geom `geom_vline()`

to add a dotted red line that indicates the 5% quantile of the daily returns. Finally, I compute the summary statistics for the return time series.

```
%>%
returns summarise_at(vars(return),
list(daily_mean = mean,
daily_sd = sd,
daily_min = min,
daily_max = max))
```

```
## # A tibble: 1 x 4
## daily_mean daily_sd daily_min daily_max
## <dbl> <dbl> <dbl> <dbl>
## 1 0.000968 0.0180 -0.138 0.113
```

```
# Alternatively: compute summary statistics for each year
%>% group_by(year = year(date)) %>%
returns summarise_at(vars(return),
list(daily_mean = mean,
daily_sd = sd,
daily_min = min,
daily_max = max))
```

```
## # A tibble: 11 x 5
## year daily_mean daily_sd daily_min daily_max
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2011 0.000821 0.0165 -0.0576 0.0572
## 2 2012 0.00113 0.0185 -0.0665 0.0850
## 3 2013 0.000308 0.0182 -0.132 0.0501
## 4 2014 0.00135 0.0136 -0.0833 0.0788
## 5 2015 -0.000121 0.0169 -0.0631 0.0558
## 6 2016 0.000467 0.0147 -0.0680 0.0629
## 7 2017 0.00157 0.0110 -0.0395 0.0592
## 8 2018 -0.000221 0.0181 -0.0686 0.0681
## 9 2019 0.00253 0.0166 -0.105 0.0661
## 10 2020 0.00237 0.0294 -0.138 0.113
## 11 2021 -0.000553 0.0192 -0.0426 0.0525
```

Now the `tidyverse`

magic starts: Tidy data and a tidy workflow makes it extremely easy to generalize the computations from before to as many assets you like. The following code takes a vector of tickers, `ticker <- c("AAPL", "MMM", "BA")`

, and automates the download as well as the plot of the price time series. At the end, we create the table of summary statistics for an arbitrary number of assets.

```
<- c("AAPL", "MMM", "BA")
ticker <- tq_get(ticker, get = "stock.prices") # Exactly the same code as in the first exercise
all_prices
%>%
all_prices ggplot(aes(x = date,
y = adjusted,
color = symbol)) +
geom_line() +
labs(x = "Date", y = "Price") + scale_y_log10()
```

Do you note the tiny difference relative to the code we used before? `tq_get(ticker)`

is able to return a tibble for several symbols as well. All we need to do to illustrate all tickers instead of only one is to include `color = symbol`

in the `ggplot`

aesthetics. That way, a separate line is generated for each ticker.

```
<- all_prices %>%
all_returns group_by(symbol) %>% # we perform the computations per symbol
mutate(return = log(adjusted) - log(lag(adjusted))) %>%
select(symbol, date, return) %>%
drop_na()
%>%
all_returns group_by(symbol) %>%
summarise_at(vars(return),
list(daily_mean = mean, daily_sd = sd, daily_min = min, daily_max = max))
```

```
## # A tibble: 3 x 5
## symbol daily_mean daily_sd daily_min daily_max
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 BA 0.000561 0.0228 -0.272 0.218
## 2 MMM 0.000434 0.0137 -0.139 0.119
## 3 AAPL 0.000968 0.0180 -0.138 0.113
```

The same holds for returns as well. Before computing returns as before we use `group_by(symbol)`

such that the `mutate`

command is performed for each symbol individually. Exactly the same logic applies for the computation of summary statistics: `group_by(symbol)`

is the key to aggregate the time series into ticker-specific variables of interest.

Lastly: Don’t try this if you are not prepared for substantial waiting times but you are now equipped with all tools to download price data for *each* ticker listed in the SP500 index with exactly the same number of lines of code. Just use `ticker <- tq_index("SP500")`

which provides you with a tibble that contains each symbol that is (currently) part of the SP500.

Sometimes, aggregation across other variables than `symbol`

makes sense as well, as e.g. in the last exercise: We take the downloaded tibble with prices and compute aggregate daily trading volume in USD. Recall, that the column *volume* is denoted in traded shares. I multiply the trading volume with the daily closing price to get a measure of the aggregate trading volume in USD. Scaling by `1e9`

denotes trading volume in billion USD.

```
<- all_prices %>%
volume mutate(volume_usd = volume * close / 1e9) %>% # I denote trading volume in billion USD
group_by(date) %>%
summarise(volume = sum(volume_usd))
%>% # Plot the time series of aggregate trading volume
volume ggplot(aes(x = date, y = volume)) +
geom_line() +
labs(x = "Date", y = "Aggregate trading volume (BUSD)")
```

One way to illustrate the persistence of trading volume would be to plot volume on day \(t\) against volume on day \(t-1\) as in the example below:

```
%>%
all_prices group_by(symbol) %>%
mutate(volume = volume*close/1e9) %>%
ggplot(aes(x=lag(volume), y = volume, color = symbol)) +
geom_point() +
labs(x = "Lag Aggregate trading volume (B USD) ", y = "Aggregate trading volume (B USD)")
```

`## Warning: Removed 1 rows containing missing values (geom_point).`

Do you understand where the warning `## Warning: Removed 1 rows containing missing values (geom_point).`

comes from and what it means? Pure eye-balling reveals that days with high trading volume are often followed by similarly high trading volume days.

## 2.2 Compute and visualize the efficient frontier

This exercise is closely aligned to the slides on optimal portfolio choice and asks you to compute and visualize the efficient frontier for a number of stocks. The solution code replicates the figure used in the slides.

### 2.2.1 Exercises

- I prepared the dataset
`exercise_clean_crsp_returns.rds`

which contains monthly returns of*all*US-listed stocks that have been continuously traded during the last 50 years. The dataset is provided in Absalon. Download the file and read it in using`read_rds()`

. How many distinct tickers are listed? For how many observations per year are returns recorded? (The code used to prepare the dataset is based on the cleaned CRSP data later in Exercise 2.5 and available at the end of the solution in 2.5). - Compute the vector of sample average returns and the sample variance covariance matrix. Which ticker exhibited the highest (lowest) monthly return among all stocks? Are there any stocks that experience
*negative*correlation? - Compute the minimum variance portfolio weight as well as the expected return and volatility of this portfolio
- Compute the efficient portfolio weights which achieve 3 times the expected return of the minimum variance portfolio.
- Make use of the two mutual fund theorem and compute the expected return and volatility for a range of combinations of the minimum variance portfolio and the efficient portfolio.
- Plot the risk-return characteristics of the individual assets in a diagram (x-axis: \(\mu\), y-axis: \(\sigma\)). Also, add the efficient frontier and the two efficient portfolios from question 3 into the illustration.

### 2.2.2 Solutions

```
# Load required packages
library(tidyverse)
# Read in the data
<- read_rds("exercise_clean_crsp_returns.rds") # You may have to adjust your path
crsp head(crsp)
```

The provided file contains three columns. `permno`

is the unique ticker identifier in the CRSP universe. `date`

corresponds to the monthly timestamp and `ret.adj`

are the adjusted monthly returns (for more specifications, take a look in the code which I used to prepare the file). I performed some cleaning steps beforehand so we are ready to go.

`%>% count(permno) # 78 unique permnos crsp `

```
## # A tibble: 78 x 2
## permno n
## <int> <int>
## 1 10145 840
## 2 10516 840
## 3 11308 840
## 4 11404 840
## 5 11674 840
## 6 11850 840
## 7 12036 840
## 8 12052 840
## 9 12060 840
## 10 12490 840
## # ... with 68 more rows
```

```
%>%
crsp count(year = lubridate::year(date)) # lubridate is a powerful tool to work with dates and time
```

```
## # A tibble: 70 x 2
## year n
## <dbl> <int>
## 1 1950 936
## 2 1951 936
## 3 1952 936
## 4 1953 936
## 5 1954 936
## 6 1955 936
## 7 1956 936
## 8 1957 936
## 9 1958 936
## 10 1959 936
## # ... with 60 more rows
```

For each of the 70 years of data we observe 936 month-ticker return observations.

Next I transform the file from being tidy into a \((T \times N)\) matrix with one column for each ticker to compute the covariance matrix \(\Sigma\) and also the expected return vector \(\mu\).

```
<- crsp %>% pivot_wider(names_from = permno, values_from = ret.adj) %>% select(-date)
returns
<- returns %>%
sigma cov(use = "pairwise.complete.obs") # Compute return sample covariance matrix
<- ncol(sigma)
N
<- returns %>%
mu colMeans() %>%
as.matrix()
%>%
returns summarise_all(max) %>%
which.max() # What is the maximum return?
```

```
## 21135
## 53
```

`<0] # Are there any negative entries in Sigma? sigma[sigma`

`## numeric(0)`

We see that permno `21135`

achieved a 53 percent return within one month. Interestingly, there is no single negative entry in the estimated variance covariance matrix.

The procedure to compute minimum variance portfolio weights has been introduced in class already. The next step requires to compute the efficient portfolio weights for a given level of return. The naming convention is aligned with the lecture slides.

```
<- rep(1, N)
iota <- solve(sigma) %*% iota
wmvp <- wmvp/sum(wmvp)
wmvp
c(t(wmvp)%*%mu, sqrt(t(wmvp)%*%sigma%*%wmvp))
```

`## [1] 0.9237972 4.4960914`

```
# Compute efficient portfolio weights for given level of expected return
<- 3 * t(wmvp)%*%mu # some benchmark return
mu_bar
<- as.numeric(t(iota)%*%solve(sigma)%*%iota)
C <- as.numeric(t(iota)%*%solve(sigma)%*%mu)
D <- as.numeric(t(mu)%*%solve(sigma)%*%mu)
E
<- as.numeric(2*(mu_bar -D/C)/(E-D^2/C))
lambda_tilde <- wmvp + lambda_tilde/2*(solve(sigma)%*%mu - D/C*solve(sigma)%*%iota)
weff
# Merge the weight vectors together and illustrate the differences in portfolio weights
full_join(as_tibble(weff, rownames = "ticker") %>% rename("Efficient Portfolio" = "V1"),
as_tibble(wmvp, rownames = "ticker") %>% rename("Min. Variance Portfolio" = "V1")) %>%
left_join(as_tibble(mu, rownames = "ticker") %>% rename("mu" = "V1")) %>%
pivot_longer(-c(ticker, "mu"), names_to = "Portfolio") %>% ggplot(aes(x= mu, y = value, color = Portfolio)) +
geom_point() +
labs(x = "Expected return", y = "Portfolio weight")
```

```
## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
## Using compatibility `.name_repair`.
```

```
## Joining, by = "ticker"
## Joining, by = "ticker"
```

The figure above illustrates the two different wealth allocations. It is evident that in order to achieve higher expected returns, the efficient portfolio takes more aggressive positions in the individual assets. More specifically, when plotting the sample mean of the individual assets on the \(x\)-axis it becomes clear that the minimum variance portfolio does not take \(\hat\mu\) into account while the efficient portfolio puts more weights on assets with high expected returns.

The two mutual fund theorem claims that as soon as we have two efficient portfolio (such as the minimum variance portfolio and the efficient portfolio for another required level of expected returns as of above), we can characterize the entire efficient frontier by combining these two portfolio. This is done in the code below. Familiarize yourself with the inner workings of the `for`

loop!.

```
# Use the two mututal fund theorem
<- seq(from = -0.4, to = 1.2, by = 0.01)
c <- tibble(c = c,
res mu = NA,
sd = NA)
for(i in seq_along(c)){ # For loop
<- (1-c[i])*wmvp + (c[i])*weff # Portfolio of mvp and efficient portfolio
w $mu[i] <- t(w) %*% mu # Portfolio expected return
res$sd[i] <- sqrt(t(w) %*% sigma %*% w) # Portfolio volatility
res }
```

Finally, it is easy to visualize everything within one, powerful figure using `ggplot2`

.

```
# Visualize the efficient frontier
ggplot(res, aes(x = sd, y = mu)) +
geom_point() + # Plot all sd/mu portfolio combinations
geom_point(data = res %>% filter(c %in% c(0,1)),
color = "red",
size = 4) + # locate the mvp and efficient portfolio
geom_point(data = tibble(mu = mu, sd = sqrt(diag(sigma))),
aes(y = mu, x = sd), color = "blue", size = 1) + # locate the individual assets
theme_minimal() # make the plot a bit nicer
```

If you want to replicate the figure in the lecture notes, you have to download price data for all constituents of the Dow Jones index first. After computing the returns, you can follow up with the code chunks provided above.

```
library(tidyquant)
<- tq_index("DOW") # Retrieve ALL tickers in Dow Jones index from Yahoo!Finance
ticker
<- ticker %>%
index_prices tq_get(get = "stock.prices") # Download price time series from Yahoo!Finance
<- index_prices %>%
returns group_by(symbol) %>% # compute returns for each symbol
tq_transmute(select = adjusted,
mutate_fun = to.monthly,
indexAt = "lastof") %>% # this cryptic line computes monthly returns based on (dividend) adjusted prices
mutate(return = 100*(log(adjusted) - log(lag(adjusted)))) %>% # compute log returns in percent
na.omit() # Compute (log) returns for each symbol (adjusted prices take dividend payments into account)
# We need: sample mean return mu, sample covariance matrix Sigma
<- returns %>%
sigma select(-adjusted) %>%
pivot_wider(names_from = symbol, values_from = return) %>% # reorder data to a T x N matrix
select(-date) %>%
cov(use = "pairwise.complete.obs") # Compute return sample covariance matrix
<- returns %>%
mu group_by(symbol) %>%
summarise(mean = mean(return)) %>%
pull(mean) # Compute sample average
```

## 2.3 Simple portfolio analysis

In this exercise you will familiarize yourself with functions, numerical optimization and loops. You will compute efficient portfolio weights (that is, portfolios with the lowest possible volatility for a given level of expected return). To make things more interesting, we will also look at some potential frictions such as short-selling constraints which prevent investors to choose the optimal portfolio. After that we implement back-testing procedures to analyze the performance of different allocation strategies.

### 2.3.1 Exercises

- Below you find code to download and prepare return time-series for 6 different assets. Use this tibble and compute the sample covariance \(\hat{\Sigma}\) of the returns based on the entire sample

- Use \(\hat{\Sigma}\) to compute the minimum variance portfolio weights. That is, given the number of assets \(N\), choose a weight vector \(w\) of length \(N\) that minimizes the portfolio volatility \[\arg\min_{w} w'\hat\Sigma w\text{, such that } w'\iota = 1\] where \(\iota\) is a vector of ones. You can do this by deriving the closed form solution for this optimization problem.
- Suppose now short-selling is prohibited. Compute the restricted minimum variance portfolio weights as a solution to the minimization problem
\[\arg\min_{w} w'\hat\Sigma w\text{, such that } w'\iota = 1 \text{ and } w_i\geq0 \forall i=1,\ldots,N.\]
You can use the package
`quadprog`

for the purpose of numerical optimization for quadratic programming problems. - What is the average monthly (in-sample) return and standard deviation of the minimum variance portfolio, \(\text{Var}(r_p) =\text{Var}(w'r)\)?
- Implement a simple backtesting strategy: Adjust your code from before and re-estimate the weights based on a rolling-window with length 100 days using a
`for`

loop. Before performing the out-of-sample exercise: What are your expectations? Which of the two strategies should deliver the lowest return volatility?

- What is the out-of-sample portfolio return standard deviation of this portfolio? What about a portfolio with short-sale constraints? What about a
*naive*portfolio that invests the same amount in every asset (thus, \(w_\text{Naive} = 1/N\iota\))? Do the results reflect your initial thoughts? Why could there be deviations between the theoretically optimal portfolio and the best performing out-of-sample strategy?

### 2.3.2 Solutions

```
# Use the following lines of code as the underlying asset universe
library(tidyverse)
library(tidyquant)
<- c("AAPL", "MMM", "AXP", "MSFT", "GS") # APPLE, 3M, Microsoft, Goldman Sachs
ticker # and American Express
<- tq_get(ticker, get = "stock.prices", from = "2015-01-01") %>%
returns group_by(symbol) %>%
mutate(return = log(adjusted) - log(lag(adjusted))) %>% # Compute log-returns
select(symbol, date, return) %>%
pivot_wider(names_from = symbol,
values_from = return) %>%
drop_na() %>%
select(-date) %>%
as.matrix()
%>% head() returns
```

```
## AAPL MMM AXP MSFT GS
## [1,] -2.857596e-02 -0.022810868 -0.02680185 -0.009238249 -0.03172061
## [2,] 9.431544e-05 -0.010720599 -0.02154240 -0.014786082 -0.02043667
## [3,] 1.392464e-02 0.007222566 0.02160511 0.012624946 0.01479266
## [4,] 3.770266e-02 0.023684663 0.01407560 0.028993874 0.01583961
## [5,] 1.071732e-03 -0.012359980 -0.01274760 -0.008440374 -0.01546565
## [6,] -2.494925e-02 -0.005459671 -0.01022680 -0.012581783 -0.01224467
```

The estimated variance-covariance matrix \(\hat\Sigma\) is a symmetric \(N \times N\) matrix with full rank. It can be easily shown that the minimum variance portfolio weights are of the form \(w\propto \hat\Sigma^{-1}\iota\). Therefore, we compute the inverse of \(\hat\Sigma\) (which is done in R with `solve()`

) and normalize the weight vector \(w\) such that it sums up to 1.

```
<- cov(returns) # computes the sample variance covariance matrix. Questions? use ?cov()
sigma sigma
```

```
## AAPL MMM AXP MSFT GS
## AAPL 0.0003501208 0.0001241169 0.0001523432 0.0002237342 0.0001792109
## MMM 0.0001241169 0.0002197898 0.0001556313 0.0001205125 0.0001617816
## AXP 0.0001523432 0.0001556313 0.0003997009 0.0001629901 0.0002720981
## MSFT 0.0002237342 0.0001205125 0.0001629901 0.0003023027 0.0001756623
## GS 0.0001792109 0.0001617816 0.0002720981 0.0001756623 0.0003636318
```

```
# Closed form solution for minimum variance portfolio weights
<- length(ticker)
N <- solve(sigma) %*% rep(1, N) # %*% denotes matrix multiplication in R.
w <- w / sum(w) # Scale w such that it sums up to 1 w
```

The weights above are the minimum variance portfolio weights for an investment universe which consists of the chosen assets only.

To compute the optimal portfolio weights in the presence of, e.g., short-selling constraints, analytic solutions to the optimization problem above are not feasible anymore. However, note that we are faced with a quadratic optimization problem with one binding constraint (\(w'\iota = 1\)) and \(N\) inequality constraints \((w_i\geq 0)\). Numerical optimization of such problems are well-established and we can rely on routines readily available in R, for instance with the package `quadprog`

.

```
# install.packages("quadprog")
library(quadprog)
```

Make sure to familiarize yourself with the documentation of the packages main function `?solve.QP`

such that you understand the following notation. In general, quadratic programming problems are of the form
\[\arg\min_w -\mu'w + 1/2w'Dw \text{ such that } A'w \geq b_0\]
where \(A\) is a (\(\tilde N\times k\)) matrix with the first \(m\) columns denoting equality constraints and the remaining \(k-m\) columns denoting inequality constraints. To get started, the following code computes the minimum variance portfolio weights using numerical procedures instead of the analytic solution above.

```
# Numerical solution for the minimum variance portfolio problem
<- solve.QP(Dmat = sigma,
w_numerical dvec = rep(0, N), # for now we do not include mean returns thus \mu = 0
Amat = cbind(rep(1, N)), # A has one column which is a vector of ones
bvec = 1, # bvec is 1 and enforces the constraint that weights sum up to one
meq = 1) # there is one (out of one) equality constraint
# Check that w and w_numerical are the same (up to numerical instabilities)
cbind(w, w_numerical$solution)
```

```
## [,1] [,2]
## AAPL 0.13036800 0.13036800
## MMM 0.54885951 0.54885951
## AXP 0.08478301 0.08478301
## MSFT 0.22394675 0.22394675
## GS 0.01204273 0.01204273
```

We see that the optimizer backs out the *correct* portfolio weights (that is, they are the same as the ones from the analytic solution). Now we extend the code above to include short-sale constraints.

```
# Make sure you understand what the matrix t(A) is doing:
<- cbind(1, diag(N))
A t(A)
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 1 1 1 1
## [2,] 1 0 0 0 0
## [3,] 0 1 0 0 0
## [4,] 0 0 1 0 0
## [5,] 0 0 0 1 0
## [6,] 0 0 0 0 1
```

```
# Now: Introduce inequality constraint: no element of w is allowed to be negative
solve.QP(Dmat = sigma,
dvec = rep(0, N),
Amat = A,
bvec = c(1, rep(0, N)),
meq = 1)
```

```
## $solution
## [1] 0.13036800 0.54885951 0.08478301 0.22394675 0.01204273
##
## $value
## [1] 8.947308e-05
##
## $unconstrained.solution
## [1] 0 0 0 0 0
##
## $iterations
## [1] 2 0
##
## $Lagrangian
## [1] 0.0001789462 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## [6] 0.0000000000
##
## $iact
## [1] 1
```

It seems like not much changed in the code: We added \(N\) inequality constraints that make sure no weight is negative. This condition ensures no short-selling. The equality constraint makes sure that the weights sum up to one. The output chunk above also illustrates everything that is returned to you when calling the function `solve.QP`

. Familiarize yourself with the documentation but what is crucial to understand is that you can call the vector that minimizes the objective function with `w$solution`

after running `w <- solve.QP(...)`

as shown above. Similar `w$value`

returns the value of the function evaluated at the minimum. To be precise, the output of the function `solve.QP`

is a list with 6 elements. List elements can be accessed with `$`

. If in doubt, read Chapter 20.5 of R for Data Science. By changing `A`

you can impose additional or other restrictions. For instance, how would you compute portfolio weights that minimize the portfolio volatility under the constraint that you cannot invest more than 30% of your wealth into any individual asset?

```
<- returns %*% w # portfolio returns
r_p list(sd = sd(r_p), mean = mean(r_p)) # Realized volatility of the portfolio
```

```
## $sd
## [1] 0.01337708
##
## $mean
## [1] 0.0005556903
```

The code snippet above shows how to compute the realized portfolio return volatility and how to generate some summary statistics for these values which can be used for backtesting purposes. However, as \(\hat\Sigma\) is computed using the entire history of the assets, we benefit from information that no real investor can actually use to base her decisions on. Instead, in the following we focus on out-of-sample computation of the portfolio weights based on a rolling window. Therefore, every month (day) we update the available set of information, recompute our concurrent estimate of \(\hat\Sigma_t\) and rebalance our portfolio to hold the efficient minimum variance portfolio. The next day, we then compute the realized return of this portfolio. That way, we never use information which is only available in the future to choose the optimal portfolio.

```
# Small function that computes the minimum variance portfolio weight for a given matrix of returns
<- function(tmp){
mvp_weights <- cov(tmp)
sigma # Closed form solution
<- solve(sigma) %*% rep(1, ncol(tmp))
w <- w / sum(w)
w return(w)
}
# Small function that computes the no-short selling minimum variance portfolio weight
<- function(tmp){
mvp_weights_ns <- cov(tmp)
sigma <- solve.QP(Dmat = sigma,
w dvec = rep(0, N),
Amat = cbind(1, diag(N)),
bvec = c(1, rep(0, N)),
meq = 1)
return(w$solution)
}
# Define out-of-sample periods
<- 100
window_length <- nrow(returns) - window_length # total number of out-of-sample periods
periods
<- matrix(NA, nrow = periods, ncol = 3) # A matrix to collect all returns
all_returns colnames(all_returns) <- c("mvp", "mvp_ns", "naive") # we implement 3 strategies
for(i in 1:periods){ # Rolling window
<- returns[i : (i + window_length - 1),] # the last X returns available up to date t
return_window
# The three portfolio strategies
<- mvp_weights(return_window)
mvp <- mvp_weights_ns(return_window)
mvp_ns <- rep(1/N, N) # Naive simply invests the same in each asset
mvp_naive
# Store realized returns w%*%w_t+1 in the matrix all_returns
1] <- returns[i + window_length, ] %*% mvp # realized mvp return
all_returns[i, 2] <- returns[i + window_length, ] %*% mvp_ns # realized constrained mvp return
all_returns[i, 3] <- returns[i + window_length, ] %*% mvp_naive # realized naive return
all_returns[i,
}
```

The framework above allows you to add additional portfolio strategies and to compare their out-of-sample performance to the benchmarks above. How do the portfolio actually perform? To make the values easier to read, I convert everything to annualized measures (assuming 250 trading days).

```
<- all_returns %>% as_tibble()
all_returns %>%
all_returns pivot_longer(everything()) %>% # Tidy-up the tibble for easier summary statistics
group_by(name) %>%
summarise_at(vars(value),
list(Mean = ~250*mean(.), Volatility = ~sqrt(250)*sd(.), Sharpe = ~sqrt(250)*mean(.)/sd(.))) %>%
arrange(Volatility)
```

```
## # A tibble: 3 x 4
## name Mean Volatility Sharpe
## <chr> <dbl> <dbl> <dbl>
## 1 mvp_ns 0.131 0.213 0.614
## 2 mvp 0.124 0.217 0.571
## 3 naive 0.166 0.229 0.725
```

## 2.4 Equivalence between Certainty equivalent maximization and minimum variance optimization

Slide 42 (Parameter uncertainty) argues that an investor with a quadratic utility function with certainty equivalent \[\max_w CE(w) = \omega'\mu - \frac{\gamma}{2} \omega'\Sigma \omega \text{ s.t. } \iota'\omega = 1\] faces an equivalent optimization problem to a framework where portfolio weights are chosen with the aim to minimize volatility given a pre-specified level or expected returns \[\min_w \omega'\Sigma \omega \text{ s.t. } \omega'\mu = \bar\mu \text{ and } \iota'\omega = 1\] Note the differences: In the first case, the investor has a (known) risk aversion \(\gamma\) which determines her optimal balance between risk (\(\omega'\Sigma\omega)\) and return (\(\mu'\omega\)). In the second case, the investor has a target return she wants to achieve while minimizing the volatility. Intuitively, both approaches are closely connected if we consider that the risk aversion \(\gamma\) determines the desirable return \(\bar\mu\). More risk averse investors (higher \(\gamma\)) will chose a lower target return to keep their volatility level down. The efficient frontier then spans all possible portfolios depending on the risk aversion \(\gamma\), starting from the minimum variance portfolio (\(\gamma = \infty\)).

### 2.4.1 Exercises

- Proof that there is an equivalence between the optimal portfolio weights in both cases.

### 2.4.2 Solution

In the following I solve for the optimal portfolio weights for a Certainty equivalent maximizing investor. The first order condition reads \[ \begin{aligned} \mu - \lambda \iota &= \gamma \Sigma \omega \\ \Leftrightarrow \omega &= \frac{1}{\gamma}\Sigma^{-1}\left(\mu - \lambda\iota\right) \end{aligned} \] Next, we make use of the constraint \(\iota'\omega = 1\). \[ \begin{aligned} \iota'\omega &= 1 = \frac{1}{\gamma}\left(\iota'\Sigma^{-1}\mu - \lambda\iota'\Sigma^{-1}\iota\right)\\ \Rightarrow \lambda &= \frac{1}{\iota'\Sigma^{-1}\iota}\left(\iota'\Sigma^{-1}\mu - \gamma \right). \end{aligned} \] Plug-in the value of \(\lambda\) reveals \[ \begin{aligned} \omega &= \frac{1}{\gamma}\Sigma^{-1}\left(\mu - \frac{1}{\iota'\Sigma^{-1}\iota}\left(\iota'\Sigma^{-1}\mu - \gamma \right)\right) \\ \Rightarrow \omega &= \frac{\Sigma^{-1}\iota}{\iota'\Sigma^{-1}\iota} + \frac{1}{\gamma}\left(\Sigma^{-1} - \frac{\Sigma^{-1}\iota}{\iota'\Sigma^{-1}\iota}\iota'\Sigma^{-1}\right)\mu\\ &= \omega_\text{mvp} + \frac{1}{\gamma}\left(\Sigma^{-1}\mu - \frac{\iota'\Sigma^{-1}\mu}{\iota'\Sigma^{-1}\iota}\Sigma^{-1}\iota\right) \end{aligned} \] As shown in the slides, this corresponds to the efficient portfolio with desired return \(\bar r\) such that(in the notation of the slides) \[\frac{1}{\gamma} = \frac{\tilde\lambda}{2} = \frac{\bar\mu - D/C}{E - D^2/C}\] which implies that the desired return is just \[\bar\mu = \frac{D}{C} + \frac{1}{\gamma}\left({E - D^2/C}\right)\] which is \(\frac{D}{C} = \mu'\omega_\text{mvp}\) for \(\gamma\rightarrow \infty\) as expected.