Chapter 3 Exploring Data
In this chapter we show how to explore and analyze data using the dataset created in Chapter @ref(#s_2Data):
load("stocks.RData")
glimpse(stocks.final)
## Observations: 2,160
## Variables: 10
## $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
## $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
## $ return <dbl> -0.07314358, 0.10481940, 0.18484208, -0.08651642, -0....
## $ adjusted <dbl> 2.489997, 2.750997, 3.259497, 2.977497, 2.015998, 2.5...
## $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
## $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
## $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
## $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
## $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
## $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...
stocks.final %>% slice(1:2)
## # A tibble: 2 x 10
## symbol date return adjusted volume sp500 Mkt.RF SMB HML
## <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL Jan ~ -0.0731 2.49 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
## 2 AAPL Feb ~ 0.105 2.75 9.22e7 -0.0201 0.0245 0.221 -0.106
## # ... with 1 more variable: RF <dbl>
3.1 Plotting Data
In this chapter we show how to create various graphs of financial timeseries and their properties, which should help us to get a better understanding of their properties, before we go on to calculate and test their statistics.
3.1.1 Time-series plots
3.1.2 Box-plots
3.1.3 Histogram and Density Plots
3.1.4 Quantile Plots
Putting it all together:
pm <- GGally::ggpairs(iris)
if(output %in% c("latex","docx")){
pm
} else if(output == "html"){
ggplotly(pm)
} else(print("No format defined for this output filetype"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Can only have one: highlight
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Can only have one: highlight
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Can only have one: highlight
## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter
## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter
## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter
## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter
3.2 Analyzing Data
3.2.1 Calculating Statistics
3.2.2 Testing Data
3.2.3 Exposure to Factors
The stocks in our example all have a certain exposure to risk factors (e.g. the Fama-French-factors we have added to our dataset). Let us specify these exposures by regression each stocks return on the factors Mkt.RF, SMB and HML:
stocks.factor_exposure <- stocks.final %>%
nest(-symbol) %>%
mutate(model = map(data, ~ lm(return ~ Mkt.RF + SMB + HML, data= .x)),
tidied = map(model, tidy)) %>%
unnest(tidied, .drop=TRUE) %>%
filter(term != "(Intercept)") %>%
select(symbol,term,estimate) %>%
spread(term,estimate) %>%
select(symbol,Mkt.RF,SMB,HML)