Chapter 3 Exploring Data

In this chapter we show how to explore and analyze data using the dataset created in Chapter @ref(#s_2Data):

load("stocks.RData")
glimpse(stocks.final)

## Observations: 2,160
## Variables: 10
## $ symbol   <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date     <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return   <dbl> -0.07314358, 0.10481940, 0.18484208, -0.08651642, -0....
## $ adjusted <dbl> 2.489997, 2.750997, 3.259497, 2.977497, 2.015998, 2.5...
## $ volume   <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500    <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF   <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB      <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML      <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF       <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...

stocks.final %>% slice(1:2)

## # A tibble: 2 x 10
##   symbol date   return adjusted volume   sp500  Mkt.RF    SMB     HML
##   <chr>  <S3:>   <dbl>    <dbl>  <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
## 1 AAPL   Jan ~ -0.0731     2.49 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL   Feb ~  0.105      2.75 9.22e7 -0.0201  0.0245 0.221  -0.106 
## # ... with 1 more variable: RF <dbl>

3.1 Plotting Data

In this chapter we show how to create various graphs of financial timeseries and their properties, which should help us to get a better understanding of their properties, before we go on to calculate and test their statistics.

3.1.1 Time-series plots

3.1.2 Box-plots

3.1.3 Histogram and Density Plots

3.1.4 Quantile Plots

Putting it all together:

pm <- GGally::ggpairs(iris)
if(output %in% c("latex","docx")){
  pm
} else if(output == "html"){
  ggplotly(pm)
} else(print("No format defined for this output filetype"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Can only have one: highlight

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Can only have one: highlight

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Can only have one: highlight

## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter

## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter

## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter

## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter

3.2 Analyzing Data

3.2.1 Calculating Statistics

3.2.2 Testing Data

3.2.3 Exposure to Factors

The stocks in our example all have a certain exposure to risk factors (e.g. the Fama-French-factors we have added to our dataset). Let us specify these exposures by regression each stocks return on the factors Mkt.RF, SMB and HML:

stocks.factor_exposure <- stocks.final %>%
    nest(-symbol) %>%
    mutate(model = map(data, ~ lm(return ~ Mkt.RF + SMB + HML, data= .x)),
            tidied = map(model, tidy)) %>% 
    unnest(tidied, .drop=TRUE) %>%
    filter(term != "(Intercept)") %>%
    select(symbol,term,estimate) %>%
    spread(term,estimate) %>%
    select(symbol,Mkt.RF,SMB,HML)