## 2.6 Extensible time series

Financial data are often indexed by date and even time. We often want to associate each observation with time stamp.

The package xts is useful to handle financial data with time stamp.

install.packages("xts")
library(xts)

Create an xts object using the xts() function:

dates<-as.Date(c("2016-01-01","2016-01-02",
"2016-01-03"))
prices <- c(1.1,2.2,3.3)
x <- xts(prices, order.by=dates)

Each observation has time timestamp.

x[1]
##            [,1]
## 2016-01-01  1.1

It is often more useful to be more specific about timing instead of number of row. Then we just need to supply the time stamp directly. Note that we will a slash to separate starting time and ending time:

x['2016-01-02/2016-01-03']
##            [,1]
## 2016-01-02  2.2
## 2016-01-03  3.3

With timestamp, we can find the first and last times:

first(x)
##            [,1]
## 2016-01-01  1.1
last(x)
##            [,1]
## 2016-01-03  3.3

To get time stamp, we use time() and to get the value, we use as.numeric():

time(x[1])
## [1] "2016-01-01"
as.numeric(x[1])
## [1] 1.1

The most common time series operation is lag(). It moves your data ahead.

lag1 <- lag(x,1)
lag1
##            [,1]
## 2016-01-01   NA
## 2016-01-02  1.1
## 2016-01-03  2.2
lag2 <- lag(x,2)
lag2
##            [,1]
## 2016-01-01   NA
## 2016-01-02   NA
## 2016-01-03  1.1

One reason that you need to use lag is that first difference cannot be calculated by x[i]-x[i-1] because they have different timestamp. Instead we need to use x[i]-lag(x,1)[i]. Alternatively, you need to extract the numeric values by using as.numeric(x[i])-as.numeric(x[i-1]).

To transfer time-stamp to another vector, we use reclass(). The following code copy the time-stamp from x to y:

y <- c(1,0,-1)
y <- reclass(y,x)
y
##            [,1]
## 2016-01-01    1
## 2016-01-02    0
## 2016-01-03   -1

Two xts objects with the same time stamps can be combined using cbind():

z <- cbind(x,y)
z
##              x  y
## 2016-01-01 1.1  1
## 2016-01-02 2.2  0
## 2016-01-03 3.3 -1

Sometimes, some entries in the data are missing and there will be calculation problems for time series calculation (e.g. lag operator lag()). There are using two ways to deal with the problem:

1. na.omit() to take away those data points, and
2. na.approx() to to take linear approximation.

Note that na.approx() is only available (and also only meaningful) for time series data.