Chapter 2 Managing Data

In this chapter we will learn how to download/import data from various sources. Most importantly we will use the quantmod library through tidyquant to download financial data from a variety of sources. We will also lear how to import ‘.xlsx’ (Excel) files.

2.1 Getting Data

2.1.1 Downloading from Online Datasources

The tidyquant package comes with a variiety of readily compiled datasets/datasources. For whole collections of data, there are the following commands available

tq_exchange_options() # find all exchanges available
## [1] "AMEX"   "NASDAQ" "NYSE"
tq_index_options() # find all indices available
## [1] "RUSSELL1000" "RUSSELL2000" "RUSSELL3000" "DOW"         "DOWGLOBAL"  
## [6] "SP400"       "SP500"       "SP600"       "SP1000"
tq_get_options() # find all data sources available
##  [1] "stock.prices"        "stock.prices.google" "stock.prices.japan" 
##  [4] "financials"          "key.ratios"          "dividends"          
##  [7] "splits"              "economic.data"       "exchange.rates"     
## [10] "metal.prices"        "quandl"              "quandl.datatable"   
## [13] "alphavantager"       "rblpapi"

The commands tq_exchange() and tq_index() will now get you all symbols and some additional information on the stock listed at that exchange or contained in that index.1

glimpse(sp500)
## Observations: 504
## Variables: 5
## $ symbol      <chr> "AAPL", "MSFT", "AMZN", "BRK.B", "FB", "JPM", "JNJ...
## $ company     <chr> "Apple Inc.", "Microsoft Corporation", "Amazon.com...
## $ weight      <dbl> 0.044387857, 0.035053855, 0.032730459, 0.016868330...
## $ sector      <chr> "Information Technology", "Information Technology"...
## $ shares_held <dbl> 53939268, 84297440, 4418447, 21117048, 26316160, 3...
glimpse(nyse)
## Observations: 3,139
## Variables: 7
## $ symbol          <chr> "DDD", "MMM", "WBAI", "WUBA", "EGHT", "AHC", "...
## $ company         <chr> "3D Systems Corporation", "3M Company", "500.c...
## $ last.sale.price <dbl> 18.4800, 206.7100, 11.6400, 68.1800, 23.2000, ...
## $ market.cap      <chr> "$2.11B", "$121.26B", "$491.85M", "$10.06B", "...
## $ ipo.year        <dbl> NA, NA, 2013, 2013, NA, NA, 2014, 2014, NA, NA...
## $ sector          <chr> "Technology", "Health Care", "Consumer Service...
## $ industry        <chr> "Computer Software: Prepackaged Software", "Me...
glimpse(nasdaq)
## Observations: 3,405
## Variables: 7
## $ symbol          <chr> "YI", "PIH", "PIHPP", "TURN", "FLWS", "FCCY", ...
## $ company         <chr> "111, Inc.", "1347 Property Insurance Holdings...
## $ last.sale.price <dbl> 13.800, 6.350, 25.450, 2.180, 11.550, 20.150, ...
## $ market.cap      <chr> NA, "$38M", NA, "$67.85M", "$746.18M", "$168.8...
## $ ipo.year        <dbl> 2018, 2014, NA, NA, 1999, NA, NA, 2011, 2014, ...
## $ sector          <chr> NA, "Finance", "Finance", "Finance", "Consumer...
## $ industry        <chr> NA, "Property-Casualty Insurers", "Property-Ca...

The datset we will be using consists of the ten largest stocks within the S&P500 that had an IPO before January 2000. Therefore we need to merge both datasets using inner_join() because we only want to keep symbols from the S&P500 that are also traded on NYSE or NASDAQ:

stocks.selection <- sp500 %>% 
  inner_join(rbind(nyse,nasdaq) %>% select(symbol,last.sale.price,market.cap,ipo.year),by=c("symbol")) %>% # join datasets
  filter(ipo.year<2000&!is.na(market.cap)) %>% # filter years with ipo<2000 or ipo=NA
  arrange(desc(weight)) %>% # sort in descending order
  slice(1:10)
TABLE 2.1: The ten largest stocks in the S&P500 with a history longer than January 2000.
symbol company weight sector shares_held last.sale.price market.cap ipo.year
AAPL Apple Inc. 0.044 Information Technology 53939268 221.07 $1067.75B 1980
MSFT Microsoft Corporation 0.035 Information Technology 84297440 111.71 $856.62B 1986
AMZN Amazon.com Inc. 0.033 Consumer Discretionary 4418447 1990.00 $970.6B 1997
CSCO Cisco Systems Inc. 0.009 Information Technology 51606584 46.89 $214.35B 1990
NVDA NVIDIA Corporation 0.007 Information Technology 6659463 268.20 $163.07B 1999
ORCL Oracle Corporation 0.006 Information Technology 32699620 49.34 $196.43B 1986
AMGN Amgen Inc. 0.005 Health Care 7306144 199.50 $129.13B 1983
ADBE Adobe Systems Incorporated 0.005 Information Technology 5402625 267.79 $131.13B 1986
QCOM QUALCOMM Incorporated 0.004 Information Technology 15438597 71.75 $105.41B 1991
GILD Gilead Sciences Inc. 0.004 Health Care 14310276 73.97 $95.89B 1992

In a next step, we will download stock prices from yahoo. Data from that source usually comes in the OHLC format (open,high,low,close) with additional information (volume, adjusted). We will additionall download data for the S&P500-index itself. Note, that we get daily prices:

stocks.prices <- stocks.selection$symbol %>% 
    tq_get(get  = "stock.prices",from = "2000-01-01",to = "2017-12-31") %>%
    group_by(symbol)
index.prices <- "^GSPC" %>% 
    tq_get(get  = "stock.prices",from = "2000-01-01",to = "2017-12-31") 
stocks.prices %>% slice(1:2) # show the first two entries of each group
## # A tibble: 20 x 8
## # Groups:   symbol [10]
##    symbol date        open   high   low close    volume adjusted
##    <chr>  <date>     <dbl>  <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1 AAPL   2000-01-03  3.75   4.02  3.63  4.00 133949200     2.69
##  2 AAPL   2000-01-04  3.87   3.95  3.61  3.66 128094400     2.46
##  3 ADBE   2000-01-03 16.8   16.9  16.1  16.4    7384400    16.1 
##  4 ADBE   2000-01-04 15.8   16.5  15.0  15.0    7813200    14.8 
##  5 AMGN   2000-01-03 70     70    62.9  62.9   22914900    53.5 
##  6 AMGN   2000-01-04 62     64.1  57.7  58.1   15052600    49.4 
##  7 AMZN   2000-01-03 81.5   89.6  79.0  89.4   16117600    89.4 
##  8 AMZN   2000-01-04 85.4   91.5  81.8  81.9   17487400    81.9 
##  9 CSCO   2000-01-03 55.0   55.1  51.8  54.0   53076000    43.6 
## 10 CSCO   2000-01-04 52.8   53.5  50.9  51     50805600    41.2 
## 11 GILD   2000-01-03  1.79   1.80  1.72  1.76  54070400     1.61
## 12 GILD   2000-01-04  1.70   1.72  1.66  1.68  38960000     1.54
## 13 MSFT   2000-01-03 58.7   59.3  56    58.3   53228400    42.5 
## 14 MSFT   2000-01-04 56.8   58.6  56.1  56.3   54119000    41.0 
## 15 NVDA   2000-01-03  3.94   3.97  3.68  3.90   7522800     3.61
## 16 NVDA   2000-01-04  3.83   3.84  3.60  3.80   7512000     3.51
## 17 ORCL   2000-01-03 31.2   31.3  27.9  29.5   98114800    26.4 
## 18 ORCL   2000-01-04 28.9   29.7  26.2  26.9  116824800    24.0 
## 19 QCOM   2000-01-03 99.6  100    87    89.7   91334000    65.7 
## 20 QCOM   2000-01-04 86.3   87.7  80    81.0   63567400    59.4

Dividends and stock splits can also be downloaded:

stocks.dividends <- stocks.selection$symbol %>% 
    tq_get(get  = "dividends",from = "2000-01-01",to = "2017-12-31") %>%
    group_by(symbol)
stocks.splits <- stocks.selection$symbol %>% 
    tq_get(get  = "splits",from = "2000-01-01",to = "2017-12-31") %>%
    group_by(symbol)

We additionally can download financial for the different stocks. Therein we have key ratios (financials, profitability, growth, cash flow, financial health, efficiency ratios and valuation ratios). These ratios are from Morningstar and come in a nested form, that we will have to ‘dig out’ using unnest.

stocks.ratios <- stocks.selection$symbol %>% 
    tq_get(get  = "key.ratios",from = "2000-01-01",to = "2017-12-31") %>%
    group_by(symbol)
## # A tibble: 56 x 3
## # Groups:   symbol [8]
##    symbol section           data              
##    <chr>  <chr>             <list>            
##  1 AAPL   Financials        <tibble [150 x 5]>
##  2 AAPL   Profitability     <tibble [170 x 5]>
##  3 AAPL   Growth            <tibble [160 x 5]>
##  4 AAPL   Cash Flow         <tibble [50 x 5]> 
##  5 AAPL   Financial Health  <tibble [240 x 5]>
##  6 AAPL   Efficiency Ratios <tibble [80 x 5]> 
##  7 AAPL   Valuation Ratios  <tibble [40 x 5]> 
##  8 MSFT   Financials        <tibble [150 x 5]>
##  9 MSFT   Profitability     <tibble [170 x 5]>
## 10 MSFT   Growth            <tibble [160 x 5]>
## # ... with 46 more rows

We find that financial ratios are only available for a subset of the ten stocks. We first filter for the ‘Growth’-information, then unnest() the nested tibbles and filter again for ‘EPS %’ and the ‘Year over Year’ information. Then we use ggplot() to plot the timeseries of Earnings per Share for the different companies.

stocks.ratios %>% filter(section=="Growth") %>% unnest() %>% 
  filter(sub.section=="EPS %",category=="Year over Year") %>% 
  ggplot(aes(x=date,y=value,color=symbol)) + geom_line(lwd=1.1) +
  labs(title="Year over Year EPS in %", x="",y="") +
  theme_tq() + scale_color_tq()

A variety of other (professional) data services are available, that are integrated into tidyquant which I will list in the following subsections:

2.1.1.1 Quandl

Quandl provides access to many different financial and economic databases. To use it, one should acquire an api key by creating a Quandl account.2 Searches can be done using quandl_search() (I personally would use their homepage to do that). Data can be downloaded as before with tq_get(), be aware that you can download either single timeseries or entire datatables with the arguments get = "quandl" and get = "quandl.datatable". Note that in the example for ‘Apple’ below, the adjusted close prices are different from the ones of Yahoo. An example for a datatable is Zacks Fundamentals Collection B.

quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
    tq_get(get          = "quandl",
           from         = "2000-01-01",
           to           = "2017-12-31",
           column_index = 11, # numeric column number (e.g. 1)
           collapse     = "daily",  # can be “none”, “daily”, “weekly”, “monthly”, “quarterly”, “annual”
           transform    = "none")    # for summarizing data: “none”, “diff”, “rdiff”, “cumul”, “normalize”
## Oil India Limited
## Code: NSE/OIL
## Desc: Historical prices for Oil India Limited<br><br>National Stock Exchange of India<br><br>Ticker: OIL<br><br>ISIN: INE274J01014
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Turnover (Lacs)
## 
## Oil Country Tubular Limited
## Code: NSE/OILCOUNTUB
## Desc: Historical prices for Oil Country Tubular Limited<br><br>National Stock Exchange of India<br><br>Ticker: OILCOUNTUB<br><br>ISIN: INE591A01010
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Turnover (Lacs)
## 
## Essar Oil Limited
## Code: NSE/ESSAROIL
## Desc: Historical prices for Essar Oil Limited<br><br>National Stock Exchange of India<br><br>Ticker: ESSAROIL<br><br>ISIN: INE011A01019
## Freq: daily
## Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Turnover (Lacs)
## # A tibble: 3 x 13
##      id dataset_code database_code name  description refreshed_at
## * <int> <chr>        <chr>         <chr> <chr>       <chr>       
## 1  6668 OIL          NSE           Oil ~ Historical~ 2018-09-13T~
## 2  6669 OILCOUNTUB   NSE           Oil ~ Historical~ 2018-09-13T~
## 3  6041 ESSAROIL     NSE           Essa~ Historical~ 2016-02-09T~
## # ... with 7 more variables: newest_available_date <chr>,
## #   oldest_available_date <chr>, column_names <list>, frequency <chr>,
## #   type <chr>, premium <lgl>, database_id <int>
## # A tibble: 5 x 12
##   date        open  high   low close volume ex.dividend split.ratio
##   <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>       <dbl>       <dbl>
## 1 2000-01-03 105.   112. 102.  112.  4.78e6           0           1
## 2 2000-01-04 108.   111. 101.  102.  4.57e6           0           1
## 3 2000-01-05 104.   111. 103   104   6.95e6           0           1
## 4 2000-01-06 106.   107   95    95   6.86e6           0           1
## 5 2000-01-07  96.5  101   95.5  99.5 4.11e6           0           1
## # ... with 4 more variables: adj.open <dbl>, adj.high <dbl>,
## #   adj.low <dbl>, adj.close <dbl>

2.1.1.2 Alpha Vantage

Alpha Vantage provides access to a real-time and historical financial data. Here we also need to get and set an api key (for free).

av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
    tq_get(get          = "alphavantager",
           av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
    tq_get(get          = "alphavantager",
           av_fun="TIME_SERIES_INTRADAY",  # for intraday data
           interval="5min") # 5 minute intervals
## # A tibble: 5 x 9
##   timestamp   open  high   low close adjusted_close volume dividend_amount
##   <date>     <dbl> <dbl> <dbl> <dbl>          <dbl>  <int>           <dbl>
## 1 2018-04-24  166.  166.  161.  163.           162. 3.37e7               0
## 2 2018-04-25  163.  165.  162.  164.           162. 2.84e7               0
## 3 2018-04-26  164.  166.  163.  164.           163. 2.80e7               0
## 4 2018-04-27  164   164.  161.  162.           161. 3.57e7               0
## 5 2018-04-30  162.  167.  162.  165.           164. 4.24e7               0
## # ... with 1 more variable: split_coefficient <dbl>
## # A tibble: 5 x 6
##   timestamp            open  high   low close volume
##   <dttm>              <dbl> <dbl> <dbl> <dbl>  <int>
## 1 2018-09-11 14:20:00  224.  224.  224.  224. 489378
## 2 2018-09-11 14:25:00  224.  224.  224.  224. 261968
## 3 2018-09-11 14:30:00  224.  224.  224.  224. 334069
## 4 2018-09-11 14:35:00  224.  224.  224.  224. 285138
## 5 2018-09-11 14:40:00  224.  224.  224.  224. 229329

2.1.1.3 FRED (Economic Data)

A large quantity of economic data can be extracted from the Federal Reserve Economic Data (FRED) database. Below we download the 1M- and 3M- risk-free-rate for the US. Note that these are annualized rates!

ir <- tq_get(c("TB1YR","TB3MS"), get = "economic.data") %>%
  group_by(symbol)
## # A tibble: 6 x 3
## # Groups:   symbol [2]
##   symbol date       price
##   <chr>  <date>     <dbl>
## 1 TB1YR  2018-08-01  2.36
## 2 TB1YR  2018-07-01  2.31
## 3 TB1YR  2018-06-01  2.25
## 4 TB3MS  2018-08-01  2.03
## 5 TB3MS  2018-07-01  1.96
## 6 TB3MS  2018-06-01  1.9

2.1.1.4 OANDA (Exchange Rates and Metal Prices)

Oanda provides a large quantity of exchange rates (currently only for the last 180 days). Enter them as currency pairs using “/” notation (e.g “EUR/USD”), and set get = "exchange.rates". Note that most of the data (having a much larger horizon) is also available on FRED.

eur_usd <- tq_get("EUR/USD", 
                  get  = "exchange.rates", 
                  from = Sys.Date() - lubridate::days(10))
plat_price_eur <- tq_get("plat", get = "metal.prices", 
                   from = Sys.Date() - lubridate::days(10),
                   base.currency = "EUR")
eur_usd %>% arrange(desc(date)) %>% slice(1:3)
## # A tibble: 3 x 2
##   date       exchange.rate
##   <date>             <dbl>
## 1 2018-09-12          1.16
## 2 2018-09-11          1.16
## 3 2018-09-10          1.16
plat_price_eur %>% arrange(desc(date)) %>% slice(1:3)
## # A tibble: 3 x 2
##   date       price
##   <date>     <dbl>
## 1 2018-09-12  681.
## 2 2018-09-11  681.
## 3 2018-09-10  680.

2.1.1.5 Bloomberg and Datastream

Bloomberg is officially integrated into the tidyquant-package, but one needs to have Bloomberg running on the terminal one is using. Datastream is not integrated but has a nice R-Interface in the package rdatastream. However, you need to have the Thomson Dataworks Enterprise SOAP API (non free) licensed, then the package allows for convienient retrieval of data. If this is not the case, then you have to manually retrieve your data, save it as “.xlsx” Excel-file that we can import using readxl::read_xlsx() from the readxl-package.

2.1.1.6 Fama-French Data (Kenneth French’s Data Library)

To download Fama-French data in batch there is a package FFdownload that I updated and that now can be installed via devtools::install_bitbucket("sstoeckl/FFdownload"). Currently you can either download all data or skip the (large) daily files using the command exclude_daily=TRUE. The result is a list of data.frames that has to be cleaned somehow but nonetheless is quite usable.

FFdownload(output_file = "FFdata.RData",         # output file for the final dataset
           tempdir = NULL, # where should the temporary downloads go to (create automatically)
           exclude_daily = TRUE, # exclude daily data
           download = FALSE) # if false, data already in the temp-directory will be used
load(file = "FFdata.RData")
factors <- FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>% 
          tk_tbl(rename_index="date") %>% # make tibble
          mutate(date=as.Date(date, frac=1)) %>% # make proper month-end date format
          gather(key=FFvar,value = price,-date) # gather into tidy format
factors %>% group_by(FFvar) %>% slice(1:2)
## # A tibble: 8 x 3
## # Groups:   FFvar [4]
##   date       FFvar  price
##   <date>     <chr>  <dbl>
## 1 1926-07-31 HML    -2.87
## 2 1926-08-31 HML     4.19
## 3 1926-07-31 Mkt.RF  2.96
## 4 1926-08-31 Mkt.RF  2.64
## 5 1926-07-31 RF      0.22
## 6 1926-08-31 RF      0.25
## 7 1926-07-31 SMB    -2.3 
## 8 1926-08-31 SMB    -1.4

2.1.2 Manipulate Data

A variety of transformations can be applied to (financial) timeseries data. We will present some examples merging together our stock file with the index, the risk free rate from FRED and the Fama-French-Factors.

Doing data transformations in tidy datasets is either called a transmute (change variable/dataset, only return calculated column) or a mutate() (add transformed variable). In the tidyquant-package these functions are called tq_transmute and tq_mutate, because they simultaneously allow changes of periodicity (daily to monthly) and therefore the returned dataset can have less rows than before. The core of these functions is the provision of a mutate_fun that can come from the the xts/zoo, quantmod (Quantitative Financial Modelling & Trading Framework for R) and TTR (Technical Trading Rules) packages.

In the examples below, we show how to change the periodicity of the data (where we keep the adjusted close price and the volume information) and calculate monthly log returns for the ten stocks and the index. We then merge the price and return information for each stock, and at each point in time add the return of the S&P500 index and the 3 Fama-French-Factors.

stocks.prices.monthly <- stocks.prices %>% 
                  tq_transmute(select = c(adjusted,volume), # which column to choose
                               mutate_fun = to.monthly,     # funtion: make monthly series
                               indexAt = "lastof") %>%      # ‘yearmon’, ‘yearqtr’, ‘firstof’, ‘lastof’, ‘startof’, or ‘endof’
                  ungroup() %>% mutate(date=as.yearmon(date)) 
stocks.returns <- stocks.prices %>% 
                  tq_transmute(select = adjusted,
                               mutate_fun = periodReturn,   # create monthly  returns
                               period="monthly", 
                               type="arithmetic") %>% 
                  ungroup() %>% mutate(date=as.yearmon(date)) 
index.returns <- index.prices %>% 
                  tq_transmute(select = adjusted,mutate_fun = periodReturn, 
                               period="monthly", type="arithmetic") %>% 
                  mutate(date=as.yearmon(date))
factors.returns <- factors %>% mutate(price=price/100) %>%  # already is monthly
                  mutate(date=as.yearmon(date)) 
stocks.prices.monthly %>% ungroup() %>% slice(1:5) # show first 5 entries
## # A tibble: 5 x 4
##   symbol date          adjusted    volume
##   <chr>  <S3: yearmon>    <dbl>     <dbl>
## 1 AAPL   Jan 2000          2.49 175420000
## 2 AAPL   Feb 2000          2.75  92240400
## 3 AAPL   Mrz 2000          3.26 101158400
## 4 AAPL   Apr 2000          2.98  62395200
## 5 AAPL   Mai 2000          2.02 108376800
stocks.returns %>% ungroup() %>% slice(1:5)        # show first 5 entries
## # A tibble: 5 x 3
##   symbol date          monthly.returns
##   <chr>  <S3: yearmon>           <dbl>
## 1 AAPL   Jan 2000              -0.0731
## 2 AAPL   Feb 2000               0.105 
## 3 AAPL   Mrz 2000               0.185 
## 4 AAPL   Apr 2000              -0.0865
## 5 AAPL   Mai 2000              -0.323
index.returns %>% ungroup() %>% slice(1:5)         # show first 5 entries
## # A tibble: 5 x 2
##   date          monthly.returns
##   <S3: yearmon>           <dbl>
## 1 Jan 2000              -0.0418
## 2 Feb 2000              -0.0201
## 3 Mrz 2000               0.0967
## 4 Apr 2000              -0.0308
## 5 Mai 2000              -0.0219
factors.returns %>% ungroup() %>% slice(1:5)       # show first 5 entries
## # A tibble: 5 x 3
##   date          FFvar    price
##   <S3: yearmon> <chr>    <dbl>
## 1 Jul 1926      Mkt.RF  0.0296
## 2 Aug 1926      Mkt.RF  0.0264
## 3 Sep 1926      Mkt.RF  0.0036
## 4 Okt 1926      Mkt.RF -0.0324
## 5 Nov 1926      Mkt.RF  0.0253

Now, we merge all the information together

## # A tibble: 5 x 10
##   symbol date   return adjusted volume   sp500  Mkt.RF     SMB     HML
##   <chr>  <S3:>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 AAPL   Jan ~ -0.0731     2.49 1.75e8 -0.0418 -0.0474  0.0505 -0.0045
## 2 AAPL   Feb ~  0.105      2.75 9.22e7 -0.0201  0.0245  0.221  -0.106 
## 3 AAPL   Mrz ~  0.185      3.26 1.01e8  0.0967  0.052  -0.173   0.0794
## 4 AAPL   Apr ~ -0.0865     2.98 6.24e7 -0.0308 -0.064  -0.0771  0.0856
## 5 AAPL   Mai ~ -0.323      2.02 1.08e8 -0.0219 -0.0442 -0.0501  0.0243
## # ... with 1 more variable: RF <dbl>

Now we can calculate and add additional information, such as the MACD (Moving Average Convergence/Divergence) and its driving signal. Be aware, that you have to group_by symbol, or the signal would just be calculated for one large stacked timeseries:

stocks.final %>% group_by(symbol) %>%
    tq_mutate(select     = adjusted, 
              mutate_fun = MACD, 
              col_rename = c("MACD", "Signal")) %>%
    select(symbol,date,adjusted,MACD,Signal) %>%
    tail() # show last part of the dataset
## # A tibble: 6 x 5
## # Groups:   symbol [1]
##   symbol  date adjusted  MACD Signal
##   <chr>  <dbl>    <dbl> <dbl>  <dbl>
## 1 GILD   2018.     73.4 -5.40  -4.38
## 2 GILD   2018.     80.8 -3.86  -4.27
## 3 GILD   2018.     78.7 -2.85  -3.99
## 4 GILD   2018.     72.8 -2.68  -3.73
## 5 GILD   2018.     72.6 -2.52  -3.49
## 6 GILD   2018.     70.0 -2.66  -3.32
save(stocks.final,file="stocks.RData")

2.1.2.1 Rolling functions

One of the most important functions you will need in reality is the possibility to perform a rolling analysis. One example would be a rolling regression to get time varying \(\alpha\) and \(\beta\) of each stock with respect to the index or the Fama-French-Factors. To do that we need to create a function that does everything we want in one step:

regr_fun <- function(data,formula) {
    coef(lm(formula, data = timetk::tk_tbl(data, silent = TRUE)))
}

This function takes a dataset and a regression formula as input, performs a regression and returns the coefficients, as well as the residual standard deviation and the respective \(R^2\)

Step 2: Create a custom function

Next, create a custom regression function, which will be used to apply over the rolling window in Step 3. An important point is that the “data” will be passed to the regression function as an xts object. The timetk::tk_tbl function takes care of converting to a data frame for the lm function to work properly with the columns “fb.returns” and “xlk.returns”.

regr_fun <- function(data) { coef(lm(fb.returns ~ xlk.returns, data = timetk::tk_tbl(data, silent = TRUE))) } Step 3: Apply the custom function

Now we can use tq_mutate() to apply the custom regression function over a rolling window using rollapply from the zoo package. Internally, since we left select = NULL, the returns_combined data frame is being passed automatically to the data argument of the rollapply function. All you need to specify is the mutate_fun = rollapply and any additional arguments necessary to apply the rollapply function. We’ll specify a 12 week window via width = 12. The FUN argument is our custom regression function, regr_fun. It’s extremely important to specify by.column = FALSE, which tells rollapply to perform the computation using the data as a whole rather than apply the function to each column independently. The col_rename argument is used to rename the added columns.

returns_combined %>% tq_mutate(mutate_fun = rollapply, width = 12, FUN = regr_fun, by.column = FALSE, col_rename = c(“coef.0”, “coef.1”))

As shown above, the rolling regression coefficients were added to the data frame.

Also check out the functionality of tibbletime for that task (rollify)!


  1. Note that tq_index() unfortunately makes use of the package XLConnect that requires Java to be installed on your system.

  2. If you do not use an API key, you are limited to 50 calls per day.