Chapter 2 Managing Data
In this chapter we will learn how to download/import data from various sources. Most importantly we will use the quantmod
library through tidyquant
to download financial data from a variety of sources. We will also lear how to import ‘.xlsx’ (Excel) files.
2.1 Getting Data
2.1.1 Downloading from Online Datasources
The tidyquant
package comes with a variiety of readily compiled datasets/datasources. For whole collections of data, there are the following commands available
tq_exchange_options() # find all exchanges available
## [1] "AMEX" "NASDAQ" "NYSE"
tq_index_options() # find all indices available
## [1] "RUSSELL1000" "RUSSELL2000" "RUSSELL3000" "DOW" "DOWGLOBAL"
## [6] "SP400" "SP500" "SP600" "SP1000"
tq_get_options() # find all data sources available
## [1] "stock.prices" "stock.prices.google" "stock.prices.japan"
## [4] "financials" "key.ratios" "dividends"
## [7] "splits" "economic.data" "exchange.rates"
## [10] "metal.prices" "quandl" "quandl.datatable"
## [13] "alphavantager" "rblpapi"
The commands tq_exchange()
and tq_index()
will now get you all symbols and some additional information on the stock listed at that exchange or contained in that index.1
glimpse(sp500)
## Observations: 504
## Variables: 5
## $ symbol <chr> "AAPL", "MSFT", "AMZN", "BRK.B", "FB", "JPM", "JNJ...
## $ company <chr> "Apple Inc.", "Microsoft Corporation", "Amazon.com...
## $ weight <dbl> 0.044387857, 0.035053855, 0.032730459, 0.016868330...
## $ sector <chr> "Information Technology", "Information Technology"...
## $ shares_held <dbl> 53939268, 84297440, 4418447, 21117048, 26316160, 3...
glimpse(nyse)
## Observations: 3,139
## Variables: 7
## $ symbol <chr> "DDD", "MMM", "WBAI", "WUBA", "EGHT", "AHC", "...
## $ company <chr> "3D Systems Corporation", "3M Company", "500.c...
## $ last.sale.price <dbl> 18.4800, 206.7100, 11.6400, 68.1800, 23.2000, ...
## $ market.cap <chr> "$2.11B", "$121.26B", "$491.85M", "$10.06B", "...
## $ ipo.year <dbl> NA, NA, 2013, 2013, NA, NA, 2014, 2014, NA, NA...
## $ sector <chr> "Technology", "Health Care", "Consumer Service...
## $ industry <chr> "Computer Software: Prepackaged Software", "Me...
glimpse(nasdaq)
## Observations: 3,405
## Variables: 7
## $ symbol <chr> "YI", "PIH", "PIHPP", "TURN", "FLWS", "FCCY", ...
## $ company <chr> "111, Inc.", "1347 Property Insurance Holdings...
## $ last.sale.price <dbl> 13.800, 6.350, 25.450, 2.180, 11.550, 20.150, ...
## $ market.cap <chr> NA, "$38M", NA, "$67.85M", "$746.18M", "$168.8...
## $ ipo.year <dbl> 2018, 2014, NA, NA, 1999, NA, NA, 2011, 2014, ...
## $ sector <chr> NA, "Finance", "Finance", "Finance", "Consumer...
## $ industry <chr> NA, "Property-Casualty Insurers", "Property-Ca...
The datset we will be using consists of the ten largest stocks within the S&P500 that had an IPO before January 2000. Therefore we need to merge both datasets using inner_join()
because we only want to keep symbols from the S&P500 that are also traded on NYSE or NASDAQ:
stocks.selection <- sp500 %>%
inner_join(rbind(nyse,nasdaq) %>% select(symbol,last.sale.price,market.cap,ipo.year),by=c("symbol")) %>% # join datasets
filter(ipo.year<2000&!is.na(market.cap)) %>% # filter years with ipo<2000 or ipo=NA
arrange(desc(weight)) %>% # sort in descending order
slice(1:10)
symbol | company | weight | sector | shares_held | last.sale.price | market.cap | ipo.year |
---|---|---|---|---|---|---|---|
AAPL | Apple Inc. | 0.044 | Information Technology | 53939268 | 221.07 | $1067.75B | 1980 |
MSFT | Microsoft Corporation | 0.035 | Information Technology | 84297440 | 111.71 | $856.62B | 1986 |
AMZN | Amazon.com Inc. | 0.033 | Consumer Discretionary | 4418447 | 1990.00 | $970.6B | 1997 |
CSCO | Cisco Systems Inc. | 0.009 | Information Technology | 51606584 | 46.89 | $214.35B | 1990 |
NVDA | NVIDIA Corporation | 0.007 | Information Technology | 6659463 | 268.20 | $163.07B | 1999 |
ORCL | Oracle Corporation | 0.006 | Information Technology | 32699620 | 49.34 | $196.43B | 1986 |
AMGN | Amgen Inc. | 0.005 | Health Care | 7306144 | 199.50 | $129.13B | 1983 |
ADBE | Adobe Systems Incorporated | 0.005 | Information Technology | 5402625 | 267.79 | $131.13B | 1986 |
QCOM | QUALCOMM Incorporated | 0.004 | Information Technology | 15438597 | 71.75 | $105.41B | 1991 |
GILD | Gilead Sciences Inc. | 0.004 | Health Care | 14310276 | 73.97 | $95.89B | 1992 |
In a next step, we will download stock prices from yahoo. Data from that source usually comes in the OHLC format (open,high,low,close) with additional information (volume, adjusted). We will additionall download data for the S&P500-index itself. Note, that we get daily prices:
stocks.prices <- stocks.selection$symbol %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
index.prices <- "^GSPC" %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31")
stocks.prices %>% slice(1:2) # show the first two entries of each group
## # A tibble: 20 x 8
## # Groups: symbol [10]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2000-01-03 3.75 4.02 3.63 4.00 133949200 2.69
## 2 AAPL 2000-01-04 3.87 3.95 3.61 3.66 128094400 2.46
## 3 ADBE 2000-01-03 16.8 16.9 16.1 16.4 7384400 16.1
## 4 ADBE 2000-01-04 15.8 16.5 15.0 15.0 7813200 14.8
## 5 AMGN 2000-01-03 70 70 62.9 62.9 22914900 53.5
## 6 AMGN 2000-01-04 62 64.1 57.7 58.1 15052600 49.4
## 7 AMZN 2000-01-03 81.5 89.6 79.0 89.4 16117600 89.4
## 8 AMZN 2000-01-04 85.4 91.5 81.8 81.9 17487400 81.9
## 9 CSCO 2000-01-03 55.0 55.1 51.8 54.0 53076000 43.6
## 10 CSCO 2000-01-04 52.8 53.5 50.9 51 50805600 41.2
## 11 GILD 2000-01-03 1.79 1.80 1.72 1.76 54070400 1.61
## 12 GILD 2000-01-04 1.70 1.72 1.66 1.68 38960000 1.54
## 13 MSFT 2000-01-03 58.7 59.3 56 58.3 53228400 42.5
## 14 MSFT 2000-01-04 56.8 58.6 56.1 56.3 54119000 41.0
## 15 NVDA 2000-01-03 3.94 3.97 3.68 3.90 7522800 3.61
## 16 NVDA 2000-01-04 3.83 3.84 3.60 3.80 7512000 3.51
## 17 ORCL 2000-01-03 31.2 31.3 27.9 29.5 98114800 26.4
## 18 ORCL 2000-01-04 28.9 29.7 26.2 26.9 116824800 24.0
## 19 QCOM 2000-01-03 99.6 100 87 89.7 91334000 65.7
## 20 QCOM 2000-01-04 86.3 87.7 80 81.0 63567400 59.4
Dividends and stock splits can also be downloaded:
stocks.dividends <- stocks.selection$symbol %>%
tq_get(get = "dividends",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
stocks.splits <- stocks.selection$symbol %>%
tq_get(get = "splits",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
We additionally can download financial for the different stocks. Therein we have key ratios (financials, profitability, growth, cash flow, financial health, efficiency ratios and valuation ratios). These ratios are from Morningstar and come in a nested form, that we will have to ‘dig out’ using unnest
.
stocks.ratios <- stocks.selection$symbol %>%
tq_get(get = "key.ratios",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
## # A tibble: 56 x 3
## # Groups: symbol [8]
## symbol section data
## <chr> <chr> <list>
## 1 AAPL Financials <tibble [150 x 5]>
## 2 AAPL Profitability <tibble [170 x 5]>
## 3 AAPL Growth <tibble [160 x 5]>
## 4 AAPL Cash Flow <tibble [50 x 5]>
## 5 AAPL Financial Health <tibble [240 x 5]>
## 6 AAPL Efficiency Ratios <tibble [80 x 5]>
## 7 AAPL Valuation Ratios <tibble [40 x 5]>
## 8 MSFT Financials <tibble [150 x 5]>
## 9 MSFT Profitability <tibble [170 x 5]>
## 10 MSFT Growth <tibble [160 x 5]>
## # ... with 46 more rows
We find that financial ratios are only available for a subset of the ten stocks. We first filter
for the ‘Growth’-information, then unnest()
the nested tibbles
and filter again for ‘EPS %’ and the ‘Year over Year’ information. Then we use ggplot()
to plot the timeseries of Earnings per Share for the different companies.
stocks.ratios %>% filter(section=="Growth") %>% unnest() %>%
filter(sub.section=="EPS %",category=="Year over Year") %>%
ggplot(aes(x=date,y=value,color=symbol)) + geom_line(lwd=1.1) +
labs(title="Year over Year EPS in %", x="",y="") +
theme_tq() + scale_color_tq()
A variety of other (professional) data services are available, that are integrated into tidyquant
which I will list in the following subsections:
2.1.1.1 Quandl
Quandl provides access to many different financial and economic databases. To use it, one should acquire an api key by creating a Quandl account.2 Searches can be done using quandl_search()
(I personally would use their homepage to do that). Data can be downloaded as before with tq_get()
, be aware that you can download either single timeseries or entire datatables with the arguments get = "quandl"
and get = "quandl.datatable"
. Note that in the example for ‘Apple’ below, the adjusted close prices are different from the ones of Yahoo. An example for a datatable is Zacks Fundamentals Collection B.
quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
tq_get(get = "quandl",
from = "2000-01-01",
to = "2017-12-31",
column_index = 11, # numeric column number (e.g. 1)
collapse = "daily", # can be “none”, “daily”, “weekly”, “monthly”, “quarterly”, “annual”
transform = "none") # for summarizing data: “none”, “diff”, “rdiff”, “cumul”, “normalize”
## Oil India Limited
## Code: NSE/OIL
## Desc: Historical prices for Oil India Limited<br><br>National Stock Exchange of India<br><br>Ticker: OIL<br><br>ISIN: INE274J01014
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Turnover (Lacs)
##
## Oil Country Tubular Limited
## Code: NSE/OILCOUNTUB
## Desc: Historical prices for Oil Country Tubular Limited<br><br>National Stock Exchange of India<br><br>Ticker: OILCOUNTUB<br><br>ISIN: INE591A01010
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Turnover (Lacs)
##
## Essar Oil Limited
## Code: NSE/ESSAROIL
## Desc: Historical prices for Essar Oil Limited<br><br>National Stock Exchange of India<br><br>Ticker: ESSAROIL<br><br>ISIN: INE011A01019
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Turnover (Lacs)
## # A tibble: 3 x 13
## id dataset_code database_code name description refreshed_at
## * <int> <chr> <chr> <chr> <chr> <chr>
## 1 6668 OIL NSE Oil ~ Historical~ 2018-09-13T~
## 2 6669 OILCOUNTUB NSE Oil ~ Historical~ 2018-09-13T~
## 3 6041 ESSAROIL NSE Essa~ Historical~ 2016-02-09T~
## # ... with 7 more variables: newest_available_date <chr>,
## # oldest_available_date <chr>, column_names <list>, frequency <chr>,
## # type <chr>, premium <lgl>, database_id <int>
## # A tibble: 5 x 12
## date open high low close volume ex.dividend split.ratio
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2000-01-03 105. 112. 102. 112. 4.78e6 0 1
## 2 2000-01-04 108. 111. 101. 102. 4.57e6 0 1
## 3 2000-01-05 104. 111. 103 104 6.95e6 0 1
## 4 2000-01-06 106. 107 95 95 6.86e6 0 1
## 5 2000-01-07 96.5 101 95.5 99.5 4.11e6 0 1
## # ... with 4 more variables: adj.open <dbl>, adj.high <dbl>,
## # adj.low <dbl>, adj.close <dbl>
2.1.1.2 Alpha Vantage
Alpha Vantage provides access to a real-time and historical financial data. Here we also need to get and set an api key (for free).
av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
tq_get(get = "alphavantager",
av_fun="TIME_SERIES_INTRADAY", # for intraday data
interval="5min") # 5 minute intervals
## # A tibble: 5 x 9
## timestamp open high low close adjusted_close volume dividend_amount
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 2018-04-24 166. 166. 161. 163. 162. 3.37e7 0
## 2 2018-04-25 163. 165. 162. 164. 162. 2.84e7 0
## 3 2018-04-26 164. 166. 163. 164. 163. 2.80e7 0
## 4 2018-04-27 164 164. 161. 162. 161. 3.57e7 0
## 5 2018-04-30 162. 167. 162. 165. 164. 4.24e7 0
## # ... with 1 more variable: split_coefficient <dbl>
## # A tibble: 5 x 6
## timestamp open high low close volume
## <dttm> <dbl> <dbl> <dbl> <dbl> <int>
## 1 2018-09-11 14:20:00 224. 224. 224. 224. 489378
## 2 2018-09-11 14:25:00 224. 224. 224. 224. 261968
## 3 2018-09-11 14:30:00 224. 224. 224. 224. 334069
## 4 2018-09-11 14:35:00 224. 224. 224. 224. 285138
## 5 2018-09-11 14:40:00 224. 224. 224. 224. 229329
2.1.1.3 FRED (Economic Data)
A large quantity of economic data can be extracted from the Federal Reserve Economic Data (FRED) database. Below we download the 1M- and 3M- risk-free-rate for the US. Note that these are annualized rates!
ir <- tq_get(c("TB1YR","TB3MS"), get = "economic.data") %>%
group_by(symbol)
## # A tibble: 6 x 3
## # Groups: symbol [2]
## symbol date price
## <chr> <date> <dbl>
## 1 TB1YR 2018-08-01 2.36
## 2 TB1YR 2018-07-01 2.31
## 3 TB1YR 2018-06-01 2.25
## 4 TB3MS 2018-08-01 2.03
## 5 TB3MS 2018-07-01 1.96
## 6 TB3MS 2018-06-01 1.9
2.1.1.4 OANDA (Exchange Rates and Metal Prices)
Oanda provides a large quantity of exchange rates (currently only for the last 180 days). Enter them as currency pairs using “/” notation (e.g “EUR/USD”), and set get = "exchange.rates"
. Note that most of the data (having a much larger horizon) is also available on FRED.
eur_usd <- tq_get("EUR/USD",
get = "exchange.rates",
from = Sys.Date() - lubridate::days(10))
plat_price_eur <- tq_get("plat", get = "metal.prices",
from = Sys.Date() - lubridate::days(10),
base.currency = "EUR")
eur_usd %>% arrange(desc(date)) %>% slice(1:3)
## # A tibble: 3 x 2
## date exchange.rate
## <date> <dbl>
## 1 2018-09-12 1.16
## 2 2018-09-11 1.16
## 3 2018-09-10 1.16
plat_price_eur %>% arrange(desc(date)) %>% slice(1:3)
## # A tibble: 3 x 2
## date price
## <date> <dbl>
## 1 2018-09-12 681.
## 2 2018-09-11 681.
## 3 2018-09-10 680.
2.1.1.5 Bloomberg and Datastream
Bloomberg is officially integrated into the tidyquant
-package, but one needs to have Bloomberg running on the terminal one is using. Datastream is not integrated but has a nice R-Interface in the package rdatastream
. However, you need to have the Thomson Dataworks Enterprise SOAP API (non free) licensed, then the package allows for convienient retrieval of data. If this is not the case, then you have to manually retrieve your data, save it as “.xlsx” Excel-file that we can import using readxl::read_xlsx()
from the readxl
-package.
2.1.1.6 Fama-French Data (Kenneth French’s Data Library)
To download Fama-French data in batch there is a package FFdownload
that I updated and that now can be installed via devtools::install_bitbucket("sstoeckl/FFdownload")
. Currently you can either download all data or skip the (large) daily files using the command exclude_daily=TRUE
. The result is a list
of data.frames
that has to be cleaned somehow but nonetheless is quite usable.
FFdownload(output_file = "FFdata.RData", # output file for the final dataset
tempdir = NULL, # where should the temporary downloads go to (create automatically)
exclude_daily = TRUE, # exclude daily data
download = FALSE) # if false, data already in the temp-directory will be used
load(file = "FFdata.RData")
factors <- FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>%
tk_tbl(rename_index="date") %>% # make tibble
mutate(date=as.Date(date, frac=1)) %>% # make proper month-end date format
gather(key=FFvar,value = price,-date) # gather into tidy format
factors %>% group_by(FFvar) %>% slice(1:2)
## # A tibble: 8 x 3
## # Groups: FFvar [4]
## date FFvar price
## <date> <chr> <dbl>
## 1 1926-07-31 HML -2.87
## 2 1926-08-31 HML 4.19
## 3 1926-07-31 Mkt.RF 2.96
## 4 1926-08-31 Mkt.RF 2.64
## 5 1926-07-31 RF 0.22
## 6 1926-08-31 RF 0.25
## 7 1926-07-31 SMB -2.3
## 8 1926-08-31 SMB -1.4
2.1.2 Manipulate Data
A variety of transformations can be applied to (financial) timeseries data. We will present some examples merging together our stock file with the index, the risk free rate from FRED and the Fama-French-Factors.
Doing data transformations in tidy datasets is either called a transmute
(change variable/dataset, only return calculated column) or a mutate()
(add transformed variable). In the tidyquant
-package these functions are called tq_transmute
and tq_mutate
, because they simultaneously allow changes of periodicity (daily to monthly) and therefore the returned dataset can have less rows than before. The core of these functions is the provision of a mutate_fun
that can come from the the xts/zoo
, quantmod
(Quantitative Financial Modelling & Trading Framework for R) and TTR
(Technical Trading Rules) packages.
In the examples below, we show how to change the periodicity of the data (where we keep the adjusted close price and the volume information) and calculate monthly log returns for the ten stocks and the index. We then merge the price and return information for each stock, and at each point in time add the return of the S&P500 index and the 3 Fama-French-Factors.
stocks.prices.monthly <- stocks.prices %>%
tq_transmute(select = c(adjusted,volume), # which column to choose
mutate_fun = to.monthly, # funtion: make monthly series
indexAt = "lastof") %>% # ‘yearmon’, ‘yearqtr’, ‘firstof’, ‘lastof’, ‘startof’, or ‘endof’
ungroup() %>% mutate(date=as.yearmon(date))
stocks.returns <- stocks.prices %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn, # create monthly returns
period="monthly",
type="arithmetic") %>%
ungroup() %>% mutate(date=as.yearmon(date))
index.returns <- index.prices %>%
tq_transmute(select = adjusted,mutate_fun = periodReturn,
period="monthly", type="arithmetic") %>%
mutate(date=as.yearmon(date))
factors.returns <- factors %>% mutate(price=price/100) %>% # already is monthly
mutate(date=as.yearmon(date))
stocks.prices.monthly %>% ungroup() %>% slice(1:5) # show first 5 entries
## # A tibble: 5 x 4
## symbol date adjusted volume
## <chr> <S3: yearmon> <dbl> <dbl>
## 1 AAPL Jan 2000 2.49 175420000
## 2 AAPL Feb 2000 2.75 92240400
## 3 AAPL Mrz 2000 3.26 101158400
## 4 AAPL Apr 2000 2.98 62395200
## 5 AAPL Mai 2000 2.02 108376800
stocks.returns %>% ungroup() %>% slice(1:5) # show first 5 entries
## # A tibble: 5 x 3
## symbol date monthly.returns
## <chr> <S3: yearmon> <dbl>
## 1 AAPL Jan 2000 -0.0731
## 2 AAPL Feb 2000 0.105
## 3 AAPL Mrz 2000 0.185
## 4 AAPL Apr 2000 -0.0865
## 5 AAPL Mai 2000 -0.323
index.returns %>% ungroup() %>% slice(1:5) # show first 5 entries
## # A tibble: 5 x 2
## date monthly.returns
## <S3: yearmon> <dbl>
## 1 Jan 2000 -0.0418
## 2 Feb 2000 -0.0201
## 3 Mrz 2000 0.0967
## 4 Apr 2000 -0.0308
## 5 Mai 2000 -0.0219
factors.returns %>% ungroup() %>% slice(1:5) # show first 5 entries
## # A tibble: 5 x 3
## date FFvar price
## <S3: yearmon> <chr> <dbl>
## 1 Jul 1926 Mkt.RF 0.0296
## 2 Aug 1926 Mkt.RF 0.0264
## 3 Sep 1926 Mkt.RF 0.0036
## 4 Okt 1926 Mkt.RF -0.0324
## 5 Nov 1926 Mkt.RF 0.0253
Now, we merge all the information together
## # A tibble: 5 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 2.49 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 2.75 9.22e7 -0.0201 0.0245 0.221 -0.106
## 3 AAPL Mrz ~ 0.185 3.26 1.01e8 0.0967 0.052 -0.173 0.0794
## 4 AAPL Apr ~ -0.0865 2.98 6.24e7 -0.0308 -0.064 -0.0771 0.0856
## 5 AAPL Mai ~ -0.323 2.02 1.08e8 -0.0219 -0.0442 -0.0501 0.0243
## # ... with 1 more variable: RF <dbl>
Now we can calculate and add additional information, such as the MACD (Moving Average Convergence/Divergence) and its driving signal. Be aware, that you have to group_by
symbol, or the signal would just be calculated for one large stacked timeseries:
stocks.final %>% group_by(symbol) %>%
tq_mutate(select = adjusted,
mutate_fun = MACD,
col_rename = c("MACD", "Signal")) %>%
select(symbol,date,adjusted,MACD,Signal) %>%
tail() # show last part of the dataset
## # A tibble: 6 x 5
## # Groups: symbol [1]
## symbol date adjusted MACD Signal
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GILD 2018. 73.4 -5.40 -4.38
## 2 GILD 2018. 80.8 -3.86 -4.27
## 3 GILD 2018. 78.7 -2.85 -3.99
## 4 GILD 2018. 72.8 -2.68 -3.73
## 5 GILD 2018. 72.6 -2.52 -3.49
## 6 GILD 2018. 70.0 -2.66 -3.32
save(stocks.final,file="stocks.RData")
2.1.2.1 Rolling functions
One of the most important functions you will need in reality is the possibility to perform a rolling analysis. One example would be a rolling regression to get time varying \(\alpha\) and \(\beta\) of each stock with respect to the index or the Fama-French-Factors. To do that we need to create a function that does everything we want in one step:
regr_fun <- function(data,formula) {
coef(lm(formula, data = timetk::tk_tbl(data, silent = TRUE)))
}
This function takes a dataset and a regression formula as input, performs a regression and returns the coefficients, as well as the residual standard deviation and the respective \(R^2\)
Step 2: Create a custom function
Next, create a custom regression function, which will be used to apply over the rolling window in Step 3. An important point is that the “data” will be passed to the regression function as an xts object. The timetk::tk_tbl function takes care of converting to a data frame for the lm function to work properly with the columns “fb.returns” and “xlk.returns”.
regr_fun <- function(data) { coef(lm(fb.returns ~ xlk.returns, data = timetk::tk_tbl(data, silent = TRUE))) } Step 3: Apply the custom function
Now we can use tq_mutate() to apply the custom regression function over a rolling window using rollapply from the zoo package. Internally, since we left select = NULL, the returns_combined data frame is being passed automatically to the data argument of the rollapply function. All you need to specify is the mutate_fun = rollapply and any additional arguments necessary to apply the rollapply function. We’ll specify a 12 week window via width = 12. The FUN argument is our custom regression function, regr_fun. It’s extremely important to specify by.column = FALSE, which tells rollapply to perform the computation using the data as a whole rather than apply the function to each column independently. The col_rename argument is used to rename the added columns.
returns_combined %>% tq_mutate(mutate_fun = rollapply, width = 12, FUN = regr_fun, by.column = FALSE, col_rename = c(“coef.0”, “coef.1”))
As shown above, the rolling regression coefficients were added to the data frame.
Also check out the functionality of tibbletime for that task (rollify
)!