5.2 Realized volatility in R
- In R various realized volatility measures are supported by the
highfrequency
package
Estimator | Command | Description |
---|---|---|
RV | rRVar() or rCov() |
daily realized variance |
TRV | rThresholdCov() |
threshold version of RV |
MedRV | rMedRVar() |
jump-robust realized variance using nearest neighbor truncation |
MinRV | rMinRVar() |
jump-robust realized variance using nearest neighbor truncation |
RVave | rAVGCov |
daily realized variance by subsampling and averaging |
BPV | rBPCOV() |
daily realized bipower variance |
TTSRV | rTSCov() |
two-times scaled realized variance |
RTTSRV | rRTSCov() |
robust version of TTSRV using jump detection rule for truncation |
- Prior to calculating any of the realized volatility measure, high–frequency prices should be cleaned and properly filtered: (a) remove observations from non–official trading hours, (b) delete zero prices, (c) select the highest possible sampling frequency (fast time scale) to obtain regularly spaced intervals which are non–overlapping and non–empty
tesla
and nasdaq
within official–trading of 6.5 hours (from 09:30 to 16:00 by Eastern time zone). Check the number of intraday prices over multiple days by employing apply.daily()
command from the package xts
. Install and load highfrequency
package to calculate daily realized variance (or realized volatility) at the highest sampling frequency by rRVar()
and rCov()
commands. Do the same for sparse sampling frequencies of 5 and 10 minutes and compare them visually using ggplot()
. Make a signature plot to determine the optimal sampling frequency from 1 to 20 minutes.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.Tesla intraday closing prices, sampled every 1 minute, covers the period from
2024-04-01
to 2025-03-28
, i.e. 245 trading days. For each trading day there is 390 intraday observations that corresponds to 6.5 official trading hours (60×6.5). Note:
rRVar()
calculates daily realized variance (RV) for a single asset, while rCov()
computes the covariance between multiple assets, but when applied to a single asset it returns the same RV as rRVar()
. The optimal sampling frequency, at which microstructure noise is reduced, can be determined by so called
signature plot
. Signature plot help identify the optimal frequency by observing where the RV stabilizes (exhibits no longer significant changes as sampling frequency decreases). # Change default time zone for the session so that timestapms matches to prepared data
Sys.setenv(TZ = "Europe/Zagreb")
library(xts) # loading xts package from the library
# Loading R data file (.R) from Google Drive using tweaked URL
load(url("https://drive.google.com/uc?export=download&id=1X0uwaGECIsrz9uZZXvfZjLcYCiTF98mi"))
# Making sure that both objects are structured as xts
<- as.xts(tesla)
tesla <- as.xts(nasdaq)
nasdaq
# Only Tesla's high-frequency data will be used in continuation of this example
str(tesla) # checking the structure of object tesla
head(tesla) # displaying first few rows
tail(tesla) # displaying last few rows
View(tesla) # opening xts object as a spreadsheet of intraday prices
# Checking the number of intraday closing prices sampled every minute
apply.daily(tesla,length)
# Installing and loading highfrequency package
install.packages("highfrequency")
library(highfrequency)
# Calculating realized variance at the highest sampling frequency
<- rRVar(tesla, # input as intraday prices for multiple days
RV1min alignBy="minutes", # depends on the data frequency
alignPeriod=1, # the highest sampling frequency is 1 minute as given
makeReturns = TRUE) # calculates returns if the inputs are prices
head(RV1min)
tail(RV1min)
length(RV1min) # one RV per day (total 245 realized variances)
# Alternative to rRVar() is rCov() if the input is a single xts object (one time-series)
<- rCov(tesla,alignBy="minutes",alignPeriod=1,makeReturns = TRUE)
RV1min
# Calculating realized variance with sparse sampling frequencies of 5 and 10 minutes
<- rCov(tesla,alignBy="minutes",alignPeriod=5,makeReturns = TRUE)
RV5min <- rCov(tesla,alignBy="minutes",alignPeriod=10,makeReturns = TRUE)
RV10min
# Comparing RV as time-series using ggplot() after combining them together
<- cbind(RV1min, RV5min, RV10min)
combined library(ggplot2)
ggplot(combined,aes(x=Index))+
geom_line(aes(y = RV1min, color = "RV (1-min)"))+
geom_line(aes(y = RV5min, color = "RV (5-min)"))+
geom_line(aes(y = RV10min, color = "RV (10-min)"))+
labs(x="", y="realized variance") +
scale_color_manual(values = c("RV (1-min)" = "coral", "RV (5-min)" = "orchid", "RV (10-min)" = "steelblue"))+
theme_minimal()+
theme(panel.background = element_rect(fill = "white", color = NA),
legend.position = c(0.8, 0.4),
legend.background = element_blank(),
legend.key = element_blank(),
legend.title = element_blank())
# Determination of optimal sampling frequency from 1 to 20 requires several steps
# Step 1: define a vector of sampling frequencies
<- 1:20
frequencies
# Step 2: Calculate daily realized variance (RV) for each frequency and trading day
<- as.data.frame(sapply(frequencies, function(frequency) {
RV_freq rCov(tesla, alignBy = "minutes", alignPeriod = frequency, makeReturns = TRUE)}))
colnames(RV_freq) <- paste0("RV", frequencies, "min")
# Step 3: Averaging out RV across days against sampling frequency
<- data.frame("AvgRV"=colMeans(RV_freq))
RV_average $frequency <- frequencies
RV_average
# Step 4: Signature plot
ggplot(RV_average, aes(x = frequency, y = AvgRV)) +
geom_line(color = "coral") +
geom_point(color = "coral", size = 2) +
geom_smooth(color="royalblue",method = "loess", span = 0.8, se = FALSE, linewidth = 1) +
labs(x = "Sampling frequency (minutes)", y = "Average realized variance",title="Signature plot") +
theme_minimal()
ggplot()
and conclude which RV measure is most robust to both microstructure noise and price jumps, respectively.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.Keep in mind that some realized measures are robust to price jumps only, and some of them are robust to both microstructure noise and price jumps when sampling sparsely.
Note: commands
rTSCov()
and rRTSCov()
do not support having an xts object of multiple days as input, but can be applied as functions to each day (options are sapply()
or apply.daily()
). TTSRV and it’s robust version RTTSRV are designed to work directly with the price data, and they don’t allow returns as inputs (there is no makeReturns
argument). Additionally, parameter η is a threshold used to detect price jumps, i.e. when η=9 squared standardized intraday returns that exceed 9 are detected as jumps which are then truncated or filtered out. In this example RTTSRV is the most robust measure, while it keeps all the data.
# Different RV measures calculated at optimal sampling frequency of 10 minutes
<- rCov(tesla,alignBy="minutes",alignPeriod=10,makeReturns = TRUE)
RV10min <- rBPCov(tesla,alignBy="minutes",alignPeriod=10,makeReturns = TRUE)
BP10min <- apply.daily(tesla, function(x) rTSCov(x, K = 10, J = 1))
TTSRV10min <- apply.daily(tesla, function(x) rRTSCov(x, K = 10, J = 1, eta=9))
RTTSRV10min
# Combining multiple xts objects into one xts object "combined2" and plotting them
=cbind(RV10min,BP10min,TTSRV10min,RTTSRV10min)
combined2ggplot(combined2,aes(x=Index))+
geom_line(aes(y = RV10min, color = "RV (10-min)"))+
geom_line(aes(y = BP10min, color = "BP (10-min)"))+
geom_line(aes(y = TTSRV10min, color = "TTSRV (10-min)"))+
geom_line(aes(y = RTTSRV10min, color = "RTTSRV (10-min)"))+
labs(x="", y="realized variance") +
scale_color_manual(values = c("RV (10-min)" = "coral", "BP (10-min)" = "orchid", "TTSRV (10-min)" = "mediumseagreen", "RTTSRV (10-min)" = "black"))+
theme_minimal()+
theme(panel.background = element_rect(fill = "white", color = NA),
legend.position = c(0.5, 0.7),
legend.background = element_blank(),
legend.key = element_blank(),
legend.title = element_blank())
- Once the realized variance is calculated using an appropriate estimator, it can be forecasted using Heterogeneous AutoRegressive (HAR) model and High frEquency bAsed VolatilitY (HEAVY) model
Command | Description |
---|---|
noZeroPrices() |
deletes observations with zero prices |
exchangeHoursOnly() |
filters raw data between market open and market close |
BNSjumpTest() |
Barndorff–Nielsen and Shephard test for the presence of jumps |
AJjumpTest() |
Ait–Sahalia and Jacod test for the presence of jumps |
HARmodel() |
estimates heterogeneous autoregressive model (HAR) and it’s variants |
HEAVYmodel() |
estimates High frEquency bAsed VolatilitY model (HEAVY) |
- HAR model is used to forecast today’s realized volatility based on realized volatility in the past (previous day RV, previous week RV and previous month RV), and therefore it is typically estimated using pre–computed realized measures (argument
inputType="RM"
)
RVΔ,dt=β0+β1RVΔ,dt−1+β2RVΔ,wt−1+β3RVΔt−1,m+ut
where RVΔ,dt and RVΔ,dt−1 are current and lagged daily realized variances, computed at sampling interval Δ, whereas RVΔ,wt−1 is a previous week realized variance, and RVΔ,mt−1 is a previous month realized variance
RVΔ,wt−1=RVΔ,dt−1+RVΔ,dt−2+⋯+RVΔ,dt−55RVΔ,mt−1=RVΔ,dt−1+RVΔ,dt−2+⋯+RVΔ,dt−2222
Equation (5.14) is basic HAR specification, which can be extended by additional features like price jumps (HARJ, HARCJ, etc.)
In the HARJ model, the jump component is added as a regressor
RVΔ,dt=β0+β1RVΔ,dt−1+β2RVΔ,wt−1+β3RVΔt−1,m+β4Jt1+ut
Jump component is estimated as a an estimate of the quadratic variation minus an estimate of the integrated variance, i.e. the realized variance minus the bipower variation, and thus Jt=RVΔt−BPVΔt
Accordingly, the Barndorff–Nielsen and Shephard (BNS) z–statistic can be computed to test if jumps are significant under alternative hypothesis (jump variation is not equal to zero)
z=RVΔt−BPVΔt√(θ−2)1mRQΔt∼N(0, 1)
- Another model is so called HEAVY model that incorporates high–frequency data (for computing realized measures) and low–frequency data (for computing daily returns) and therefore it is a two–components model (it’s structure resembles a GARCH type model)
σ2t=ω+αRMt−1+βσ2t−1μt=ωR+αRRMt−1+βRμt−1
har10min
) and HAR extended with daily jump component (object harj10min
). For the same purpose pre–compute realized measures, along with testing for the presence of jumps by BNSjumpTest()
command. All pre–comped measures should be based on the same sampling frequency of 10 minutes (as determine in the previous example). Compare estimated models using modelsummary()
command. According to best fit model, predict realized variance for 1−steps ahead by predict()
command. Provide 1− step ahead forecasting using rolling window.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.In this example significant jumps occur in 62 days out of total 245 observed days. The jump component lagged for 1 period does not improve basic HAR model (for a better fit number of periods should be calibrated). Rolling window forecasting is considered to mimic 1−step ahead out–of–sample forecast by re–estimating basic HAR model 45 times in the window of 200 observations.
# Pre-computing realized measures
<- rCov(tesla,alignBy="minutes",alignPeriod=10,makeReturns = TRUE) # non-robust
RV10min <- rBPCov(tesla,alignBy="minutes",alignPeriod=10,makeReturns = TRUE) # jump-robust
BP10min
# Testing for the presence of jumps
<- BNSjumpTest(tesla, alignBy="minutes", alignPeriod=10, makeReturns=TRUE)
bns sum(bns$p.value < 0.05) # numeber of day with significnt jumps
# Plotting z-statistics of significant jumps
$significant <- ifelse(bns$p.value < 0.05, bns$ztest, 0)
bns
ggplot(bns, aes(x = Index)) +
geom_segment(aes(xend = Index, y = 0, yend = significant), color = "darkred", linewidth = 0.6) +
labs(title = "Significant price jumps z-statistic (BNS Test)",
x = "Day",
y = "z-statistic") +
theme_minimal()
# Estimating basic HAR model, printing the results and plotting in-the sample
<- HARmodel(RV10min, periods = c(1, 5, 22),type = "HAR", inputType = "RM")
har10min summary(har10min)
plot(har10min)
# Forecasting for 1-steps ahead
predict(har10min, stepsAhead=1)
# Estimating HARJ model
<- HARmodel(cbind(RV10min, BP10min), periods = c(1, 5, 22),periodsJ= c(1), type = "HARJ", inputType = "RM")
harj10min
# Comparing the results of both models
modelsummary(list("Basic HAR"=har10min,"HARJ model"=harj10min),stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),fmt=4)
# Rolling window forecasting
<- RV10min
input colnames(input) <- c("RV10min")
$forecast <- NA
inputfor (i in 1:45) {
<- HARmodel(data=input[i:(i+199),"RV10min"],periods = c(1, 5, 22),type = "HAR", inputType = "RM")
estimated $forecast[i+200] <- predict(estimated,stepsAhead=1)
input
}
plot(input[201:245,1:2], main="Rolling window forecasts")