5.3 Realized volatility in R
- In R various realized volatility measures are supported by the
highfrequency
package
Estimator | Command | Description |
---|---|---|
RV | rRVar() or rCov() |
daily realized variance |
MedRV | rMedRVar() |
jump-robust realized variance using nearest neighbor truncation |
MinRV | rMinRVar() |
jump-robust realized variance using nearest neighbor truncation |
RVave | rAVGCov |
daily realized variance by subsampling and averaging |
BPV | rBPCOV() |
daily realized bipower variance |
TTSRV | rTSCov() |
two-times scale realized variance |
RTTSRV | rRTSCov() |
robust version of TTSRV using jump detection rule for truncation |
- Prior to calculating any of the realized volatility measure, high–frequency prices should be cleaned and properly filtered: (a) remove observations from non–official trading hours, (b) delete zero prices, (c) select the highest possible sampling frequency (fast time scale) to obtain regularly spaced intervals which are non–overlapping and non–empty
Example 15. Load high–frequency data file (.R) from
Google Drive URL, consisting of 1
minute closing prices for
Tesla intraday closing prices, sampled every 1 minute, covers the period from
Note:
The optimal sampling frequency, at which microstructure noise is reduced, can be determined by so called
tesla
and nasdaq
within official–trading of 6.5 hours (from 09:30 to 16:00 by Eastern
time zone). Check the number of intraday prices over multiple days by
employing apply.daily()
command from the package
xts
. Install and load highfrequency
package to
calculate daily realized variance (or realized volatillity) at the
highest sampling frequency by rRVar()
and
rCov()
commands. Do the same for sparse sampling
frequencies of 5 and 10 minutes and compare them visually
using ggplot()
. Make a signature plot to determine the
optimal sampling frequency from 1
to 20 minutes.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.Tesla intraday closing prices, sampled every 1 minute, covers the period from
2024-04-01
to 2025-03-28
, i.e. 245 trading days. For each trading day
there is 390 intraday observations
that corresponds to 6.5 official trading hours (60×6.5). Note:
rRVar()
calculates daily realized variance (RV) for a
single asset, while rCov()
computes the covariance between
multiple assets, but when applied to a single asset, it returns the same
RV as rRVar()
. The optimal sampling frequency, at which microstructure noise is reduced, can be determined by so called
signature plot
. Signature plot help identify the optimal
frequency by observing where the RV stabilizes (exhibits no longer
significant changes as sampling frequency increases). # Loading R data file (.R) from Google Drive using tweaked URL
load(url("https://drive.google.com/uc?export=download&id=1X0uwaGECIsrz9uZZXvfZjLcYCiTF98mi"))
# Making sure that both objects are structured as xts
library(xts) # loading xts package from the library
tesla <- as.xts(tesla) # structured as xts object
nasdaq <- as.xts(nasdaq) # structured as xts object
# Only Tesla's high-frequency data will be used in continuation of this example
str(tesla) # checking the structure of object tesla
head(tesla) # displaying first few rows
tail(tesla) # displaying last few rows
View(tesla) # opening xts object as a spreadsheet of intraday prices
# Checking the number of intraday closing prices sampled every minute
apply.daily(tesla,length)
# Calculating realized variance at the highest sampling frequency
RV1min <- rRVar(tesla, # input as intraday prices for multiple days
alignBy="minutes", # depends on the data frequency
alignPeriod=1, # prices will be aligned every 1 minute
makeReturns = TRUE) # calculates returns if the inputs are prices
head(RV1min)
tail(RV1min)
length(RV1min) # one RV per day
# Alternative to rRVar() is rCov() if the input is a single xts object (one time-series)
RV1min <- rCov(tesla,alignBy="minutes",alignPeriod=1,makeReturns = TRUE)
# Calculating realized variance with sparse sampling frequencies of 5 and 10 minutes
RV5min <- rCov(tesla,alignBy="minutes",alignPeriod=5,makeReturns = TRUE)
RV10min <- rCov(tesla,alignBy="minutes",alignPeriod=10,makeReturns = TRUE)
# Comparing RV as time-series using ggplot() after combining them together
combined <- cbind(RV1min, RV5min, RV10min)
library(ggplot2)
ggplot(combined,aes(x=Index))+
geom_line(aes(y = RV1min, color = "RV (1-min)"))+
geom_line(aes(y = RV5min, color = "RV (5-min)"))+
geom_line(aes(y = RV10min, color = "RV (10-min)"))+
labs(x="", y="realized variance") +
scale_color_manual(values = c("RV (1-min)" = "coral", "RV (5-min)" = "orchid", "RV (10-min)" = "steelblue"))+
theme_minimal()+
theme(panel.background = element_rect(fill = "white", color = NA),
legend.position = c(0.8, 0.4),
legend.background = element_blank(),
legend.key = element_blank(),
legend.title = element_blank())
# Determination of optimal sampling frequency from 1 to 20 requires several steps
# Step 1: define a vector of sampling frequencies
frequencies <- 1:20
# Step 2: Calculate daily realized variance (RV) for each frequency and trading day
RV_freq <- as.data.frame(sapply(frequencies, function(frequency) {
rCov(tesla, alignBy = "minutes", alignPeriod = frequency, makeReturns = TRUE)}))
colnames(RV_freq) <- paste0("RV", frequencies, "min")
# Step 3: Averaging out RV across days against sampling frequency
RV_average <- data.frame("AvgRV"=colMeans(RV_freq))
RV_average$frequency <- frequencies
# Step 4: Singanture plot
ggplot(RV_average, aes(x = frequency, y = AvgRV)) +
geom_line(color = "coral") +
geom_point(color = "coral", size = 2) +
labs(x = "Sampling frequency (minutes)", y = "Average realized variance",title="Signature plot") +
theme_minimal()
Example 16. Assuming that optimal sampling frequency is
8 minutes compute various realized
variance measures: RV, BPV, TTSRV and RTTSRV. Compare them by
Keep in mind that some RV measures are robust to price jumps only, and some of them are robust to both microstructure noise and price jumps.
Note: commands
In this example RTTSRV is the most robust measure, while it keeps all the data.
ggplot()
and conclude which RV measure is most robust to
both microstructure noise and price jumps, respectively.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.Keep in mind that some RV measures are robust to price jumps only, and some of them are robust to both microstructure noise and price jumps.
Note: commands
rTSCov()
and
rRTSCov()
do not support having an xts object of multiple
days as input, but can be applied a functions to each day (options are
sapply()
or apply.daily()
). TTSRV and it’s
robust version RTTSRV are designed to work directly with the price data,
and they don’t allow returns as inputs (there is no
makeReturns
argument). Additionally, parameter η is a threshold used to detect price
jumps, i.e. when η=9 squared
standardized intraday returns that exceed 9 are detected as jumps which are then
truncated or filtered out. In this example RTTSRV is the most robust measure, while it keeps all the data.
# Different RV measures calculated at optimal sampling frequency of 8 minutes
RV8min <- rCov(tesla,alignBy="minutes",alignPeriod=8,makeReturns = TRUE)
BP8min <- rBPCov(tesla,alignBy="minutes",alignPeriod=8,makeReturns = TRUE)
TTSRV8min <- apply.daily(tesla, function(x) rTSCov(x, K = 8, J = 1))
RTTSRV8min <- apply.daily(tesla, function(x) rRTSCov(x, K = 8, J = 1, eta=9))
# Combining multiple xts objects into one xts object "combined2" and plotting them
combined2=cbind(RV8min,BP8min,TTSRV8min,RTTSRV8min)
ggplot(combined2,aes(x=Index))+
geom_line(aes(y = RV8min, color = "RV (8-min)"))+
geom_line(aes(y = BP8min, color = "BP (8-min)"))+
geom_line(aes(y = TTSRV8min, color = "TTSRV (8-min)"))+
geom_line(aes(y = RTTSRV8min, color = "RTTSRV (8-min)"))+
labs(x="", y="realized variance") +
scale_color_manual(values = c("RV (8-min)" = "coral", "BP (8-min)" = "orchid", "TTSRV (8-min)" = "mediumseagreen", "RTTSRV (8-min)" = "black"))+
theme_minimal()+
theme(panel.background = element_rect(fill = "white", color = NA),
legend.position = c(0.5, 0.7),
legend.background = element_blank(),
legend.key = element_blank(),
legend.title = element_blank())
- Once the realized variance is calculated using an appropriate estimator, it can be forecasted using Heterogeneous AutoRegressive (HAR) model and High frEquency bAsed VolatilitY (HEAVY) model
Command | Description |
---|---|
noZeroPrices() |
deletes observations with zero prices |
exchangeHoursOnly() |
filters raw data between market open and market close |
BNSjumpTest() |
Barndorff–Nielsen and Shephard test for the presence of jumps |
AJjumpTest() |
Ait–Sahalia and Jacod test for the presence of jumps |
HARmodel() |
estimates heterogeneous autoregressive model (HAR) and it’s variants |
HEAVYmodel() |
estimates High frEquency bAsed VolatilitY model (HEAVY) |
- HAR model is used to forecast today’s realized volatility based on realized volatility in the past (previous day RV, previous week RV and previous month RV), and therefore it is typically estimated using pre-computed realized measures (argument
inputType="RM"
) using rCov.
RVΔ,dt=β0+β1RVΔ,dt−1+β2RVΔ,wt−1+β3RVΔt−1,m
where RVΔ,dt and RVΔ,dt−1 are daily realized variances at time t and t−1, whereas RVΔ,wt−1=RVΔ,dt−1+RVΔ,dt−2+⋯+RVΔ,dt−55 is previous week realized variance, and RVΔ,mt−1=RVΔ,dt−1+RVΔ,dt−2+⋯+RVΔ,dt−2222 is previous month realized variance
Equation xx is the basic specification, which can be extended by additional features like price jumps
Another model is so called HEAVY model that incorporates high–frequency data (for computing realized measures ) and low–frequency data (for computing daily returns) and therefore it is a two–equations model (it’s structure resembles a GARCH type model)
σ2t=ω+αRMt−1+βσ2t−1μt=ωR+αRRMt−1+βRμt−1
Example 17. Estimate three heterogeneous autoregressive
models: basic HAR model (object
har8min
), HAR extended with
daily jump component (object harj8min
) and HAR extended
with continuous and daily jump components (harcjl8min
). For
the same purpose pre-compute daily jump component jump
as a
difference between realized variance and bipower variation, along with
testing for the presence of jumps by BNSjumpTest()
command.
All pre-compued measures should be based on the same sampling frequency
of 8 minutes (as determine in the
previous example). Comapre estimated models using
modelsummary()
command. According to best fit model,
predict realized variance for 10
steps ahead by predict()
command.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.# Different RV measures calculated at optimal sampling frequency of 8 minutes
# Force conversion of index to POSIXct with a valid timezone (e.g., UTC)
returns_8min <- Cl(makeReturns(to.period(tesla, period = "minutes", k = 8)))
har8min <- HARmodel(data = returns_8min, periods = c(1, 5, 22),type = "HAR", inputType = "returns")
harj8min <- HARmodel(data = returns_8min, periods = c(1, 5, 22),periodsJ= c(1), type = "HARJ", inputType = "returns")
harcj8min <- HARmodel(data = returns_8min, periods = c(1, 5, 22),periodsJ= c(1), type = "HARCJ", inputType = "returns")
modelsummary(list("Basic HAR"=har8min,"HARJ model"=harj8min,"HARCJ model"=harcj8min),stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),fmt=4)
# HEAVY model estimation requires daily returns and realized measure as inputs
RV8min <- rCov(tesla,alignBy="minutes",alignPeriod=8,makeReturns = TRUE)
daily_returns <- apply.daily(makeReturns(tesla), sum) # aggregated from intraday returns
length(daily_returns)==length(RV8min) # checking if both inputs are of the same size
inputs <- cbind(daily_returns*100,RV8min*10000) # returns in the first column and realized measure in the secon column
sum(is.na(inputs))
nrow(inputs)
heavy <- HEAVYmodel(data=inputs)
summary(heavy)
plot(heavy)
# Rolling window
inputs$heavy.forecast <- NA
for (i in 1:45) {
estimated <- HEAVYmodel(data=inputs[i:(i+199),1:2])
inputs$heavy.forecast[i+200] <- predict(estimated,stepsAhead=1)[,"CondRM"]
}
plot(inputs[201:245,2:3])