# 11 Time Series Visualization & Analysis

`plot(stl(co2, s.window = "periodic"))`

Time series data are widely used in the financial sector for obvious reasons, and
these methods can be very useful environmental data science especially in studies
such as looking at long-term climate change variables such as the Mauna Loa CO_{2}
data above, or at the much finer temporal scale, environmental data from data loggers such as eddy covariance
flux towers.

(Blackburn, Oliphant, and Davis 2021)

## 11.1 Creation of time series (ts) data

A time series (ts) is created with the ts() function.

- the time unit can be anything
- observations must be a regularly spaced series
- time values are not normally included as a field in the ts

```
library(tidyverse)
<- c(58,61,62,63,64,67,67,68,71,70,64,58)
SFhighF <- c(47,48,49,50,51,53,54,55,56,55,51,47)
SFlowF <- (SFhighF-32)*5/9
SFhighC <- (SFlowF-32)*5/9
SFlowC <- bind_cols(high=SFhighC,low=SFlowC)
SFtempC plot(SFtempC); plot(ts(SFtempC))
```

### 11.1.1 frequency setting

Frequency setting is a key parameter for ts()

- sets how many observations per time unit
- ts() mostly doesn’t seem to care what the time unit is, however some functions figure it out, at least for an annual time unit, e.g. that 1-12 means months when there’s a frequency of 12

`plot(ts(SFtempC, frequency=1), main="monthly time unit")`

`plot(ts(SFtempC, frequency=12), main="yearly time unit")`

**frequency < 1**

If you have data of lower frequency than 1 per unit

- e.g. greenhouse gas data values every 20 years, starting in year 20, frequency 1/20 = 0.05

```
library(dslabs)
data("greenhouse_gases")
<- pivot_wider(greenhouse_gases, names_from = gas,
GHGwide values_from = concentration)
<- ts(GHGwide, frequency=0.05, start=20)
GHG plot(GHG)
```

– greenhouse gas data from Irizarry (2019)

**start and end parameters**

- the time of the first (start) and last (end) observations. Either a single number or a vector of two numbers (the second of which is an integer), which specify a natural time unit and a (1-based) number of samples into the time unit.
- Example with year as the time unit and monthly data, starting July 2019 and ending June 2020:

`frequency=12, start=c(2019,7), end=c(2020,6)`

### 11.1.2 Data logger data from Marble Mountains resurgence

[** Marbles**]

For a study of chemical water quality and hydrology of a karst system in the Marble Mountains of California,
a spring *resurgence* was instrumented to measure water level, temperature, and specific conductance (a surrogate for total dissolved solids) over a 4-year period (J. D. Davis and Davis 2001).

```
library(tidyverse); library(lubridate)
<- read_csv(system.file("extdata","resurgenceData.csv",package="iGIScData")) %>%
resurg mutate(date_time = mdy_hm(paste(date, time))) %>%
::select(date_time, ATemp, BTemp, wlevelm, EC, InputV)
dplyr<- ts(resurg, frequency = 12*365, start = c(1994, 266*12+7))
resurgTS plot(resurgTS)
```

## 11.2 Data smoothing methods

### 11.2.1 moving average (ma)

- Simple generalization of sequential data
- The order parameter is how many values are averaged in the moving window
- should be an odd number

```
library(forecast)
ts(SFtempC, frequency=12)
```

```
## high low
## Jan 1 14.44444 8.333333
## Feb 1 16.11111 8.888889
## Mar 1 16.66667 9.444444
## Apr 1 17.22222 10.000000
## May 1 17.77778 10.555556
## Jun 1 19.44444 11.666667
## Jul 1 19.44444 12.222222
## Aug 1 20.00000 12.777778
## Sep 1 21.66667 13.333333
## Oct 1 21.11111 12.777778
## Nov 1 17.77778 10.555556
## Dec 1 14.44444 8.333333
```

`ma(ts(SFtempC,frequency=12),order=3)`

```
## [,1] [,2]
## Jan 1 NA NA
## Feb 1 15.74074 8.888889
## Mar 1 16.66667 9.444444
## Apr 1 17.22222 10.000000
## May 1 18.14815 10.740741
## Jun 1 18.88889 11.481481
## Jul 1 19.62963 12.222222
## Aug 1 20.37037 12.777778
## Sep 1 20.92593 12.962963
## Oct 1 20.18519 12.222222
## Nov 1 17.77778 10.555556
## Dec 1 NA NA
```

**moving average of CO2 data**

Difference shows time-local fluctuations, a component of the data, from Irizarry (2019).

```
library(dslabs)
data("greenhouse_gases")
<- pivot_wider(greenhouse_gases,
GHGwide names_from = gas,
values_from = concentration)
<- ts(GHGwide$CO2,
CO2 frequency = 0.05)
library(forecast)
<- ma(CO2, order=7)
CO2ma plot(CO2)
```

`plot(CO2ma)`

`plot(CO2-CO2ma)`

### 11.2.2 loess (local regression) smoothing

[place holder – need to work out]

From:

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/loess

Local Polynomial Regression Fitting

- Fit a polynomial surface determined by one or more numerical predictors, using local fitting.

From:

http://r-statistics.co/Loess-Regression-With-R.html

a non-parametric approach that fits regressions within local neighborhoods

- if X variables are bound within a range [?]

## 11.3 Decomposing time series

: separating a time series into its constituent components.

- original data
- trend component, removes seasonal and remainder
- if seasonal, also a seasonal component. Note that
**season**relates to the time unit. If 1 year, “seasonality” refers to the normal usage of seasons over a year. But if 1 day, seasons refers to different parts of a day, etc. - irregular “random” remainder (time-local variation)

There are two common decomposition methods, the classic method `decompose`

using moving averages, which we’ll look
at first, then the “loess” method used by `stl`

.

**Mauna Loa CO _{2} data with seasonality**

A good place to see the effect of seasonality is to look at the Mauna Loa CO_{2} data, which shows regular annual cycles, yet with
a regularly increasing trend over the years. The decomposition shows the original observations, followed by a trend line that removes the seasonal and local (short-term) random irregularities, a detrended *seasonal* picture which removes that trend to just show the seasonal cycles, followed by the random irregularities. Note the vertical scale: the units are all the same – parts per million – so the actual amplitude of the seasonal cycle should be the same as the annual amplitude of the observations, it’s just scaled to the chart height, which tends to exaggerate the seasonal cycles and random irregularities.

`plot(decompose(co2))`

**Seasonal decomposition using loess**

The `stl`

function stands for “Seasonal decomposition of Time series by Loess.”

Loess is a smoothing algorithm, that wor. If s.window = “periodic” the mean is used for smoothing;
the seasonal values are removed and the remainder smoothed to find the trend.

`plot(stl(co2, s.window="periodic"))`

**Marble Mountains karst resurgence study**

[** Marbles**]

In the Marble Mountains karst resurgence study (J. D. Davis and Davis 2001), water level water level has an annual seasonality that might be best understand by decomposition.

```
library(tidyverse); library(lubridate)
<- read_csv(system.file("extdata","resurgenceData.csv",package="iGIScData")) %>%
resurg mutate(date_time = mdy_hms(paste(date, time))) %>%
::select(date_time, ATemp, BTemp, wlevelm, EC, InputV)
dplyr<- ts(resurg$wlevelm, frequency = 12*365,
wlevelm start = c(1994, 266*12+7))
<- stl(wlevelm, s.window="periodic")
fit plot(fit)
```

### 11.3.1 lag regression

We’ll explore solar radiation and temperature data for Bugac, Hungary, and Manaus, Brazil, maintained by the European Fluxes Database (*European Fluxes Database*, n.d.). (Create a ** SolarRad_Temp** project for this.)

```
library(readxl)
<- read_xls(system.file("extdata", "SolarRad_Temp.xls", package="iGIScData"),
BugacSolstice sheet="BugacHungary", col_types = "numeric") %>%
filter(Year != "YYYY" & `Day of Yr` < 177 & `Day of Yr` > 168) %>%
::select(SolarRad, Tair)
dplyr<- ts(BugacSolstice, frequency = 48)
BugacSolsticeTS plot(BugacSolstice, main="a simple scatter plot that illustrates hysteresis"); plot(BugacSolsticeTS)
```

```
<- function(i) {sqrt(sum(lm(Tair~lag(SolarRad,i), data=BugacSolsticeTS)$resid**2))}
mod for(i in 0:5){print(paste(i,":",mod(i),sep=""))}
```

```
## [1] "0:64.0135215060866"
## [1] "1:55.8980329398225"
## [1] "2:50.4934737943067"
## [1] "3:48.1321353382658"
## [1] "4:49.6489246846601"
## [1] "5:54.3634439433188"
```

```
<- decompose(ts(BugacSolstice$SolarRad, frequency = 48))
SolarRad_comp <- decompose(ts(BugacSolstice$Tair, frequency = 48))
Tair_comp <- bind_cols(Rad = SolarRad_comp$seasonal, Temp = Tair_comp$seasonal)
seasonals ggplot(seasonals) +
geom_line(aes(x=seq_along(Rad), y=Rad/50), col="red", size=1) +
geom_line(aes(x=seq_along(Temp), y=Temp), col="darkgreen",size=1) +
scale_x_continuous(breaks = seq(0,480,12))
```

`which.max(seasonals$Rad)`

`## [1] 23`

`which.max(seasonals$Temp)`

`## [1] 28`

```
library(readxl)
<- read_xls(system.file("extdata", "SolarRad_Temp.xls", package="iGIScData"),
ManausSolstice sheet="ManausBrazil", col_types = "numeric") %>%
filter(Year != "YYYY" & `Day of Yr` < 177 & `Day of Yr` > 168) %>%
::select(SolarRad, Tair)
dplyr<- ts(ManausSolstice, frequency = 48)
ManausSolsticeTS plot(ManausSolstice, main="Manaus Brazil"); plot(ManausSolsticeTS)
```

```
<- function(i) {sqrt(sum(lm(Tair~lag(SolarRad,i), data=ManausSolsticeTS)$resid**2))}
mod for(i in 0:5){print(paste(i,":",mod(i),sep=""))}
```

```
## [1] "0:36.6765871805181"
## [1] "1:33.572327106606"
## [1] "2:31.9789648893589"
## [1] "3:31.2029493549955"
## [1] "4:31.6171985286944"
## [1] "5:32.6933416047147"
```

```
<- decompose(ts(ManausSolstice$SolarRad, frequency = 48))
SolarRad_comp <- decompose(ts(ManausSolstice$Tair, frequency = 48))
Tair_comp <- bind_cols(Rad = SolarRad_comp$seasonal, Temp = Tair_comp$seasonal)
seasonals ggplot(seasonals) +
geom_line(aes(x=seq_along(Rad), y=Rad/50), col="red", size=1) +
geom_line(aes(x=seq_along(Temp), y=Temp), col="darkgreen",size=1) +
scale_x_continuous(breaks = seq(0,480,12))
```

`which.max(seasonals$Rad)`

`## [1] 30`

`which.max(seasonals$Temp)`

`## [1] 30`

### 11.3.2 Ensemble Average

For any time series data with meaningful time units like days or years (or even weeks when the data varies by days of the week – this was the case with COVID-19 data), ensemble averages are a good way to visualize changes in a variable over that time unit.

As you could see from the decomposition and sequential graphics we created for the Bugac and Manaus solar radiation and temperature data, looking at what happens over one day as an “ensemble” of all of the days – where the mean value is displayed along with error bars based on standard deviations of the distribution – might be a useful figure.

```
library(cowplot)
<- read_xls(system.file("extdata","SolarRad_Temp.xls",package="iGIScData"),
Manaus sheet="ManausBrazil", col_types = "numeric") %>%
::select(Year:Tair) %>% filter(Year != "YYYY")
dplyr<- Manaus %>% group_by(Hr) %>%
ManausSum summarize(meanRad = mean(SolarRad), meanTemp = mean(Tair),
sdRad = sd(SolarRad), sdTemp = sd(Tair))
<- ManausSum %>% ggplot(aes(x=Hr)) + scale_x_continuous(breaks=seq(0,24,3))
px <- px + geom_line(aes(y=meanRad), col="blue") +
p1 geom_errorbar(aes(ymax = meanRad + sdRad, ymin = meanRad - sdRad)) +
ggtitle("Manaus 2005 ensemble solar radiation")
<- px + geom_line(aes(y=meanTemp),col="red") +
p2 geom_errorbar(aes(ymax=meanTemp+sdTemp, ymin=meanTemp-sdTemp)) +
ggtitle("Manaus 2005 ensemble air temperature")
plot_grid(plotlist=list(p1,p2), ncol=1, align='v') # from cowplot
```

```
<- read_xls(system.file("extdata","SolarRad_Temp.xls",package="iGIScData"),
Bugac sheet="BugacHungary", col_types = "numeric") %>%
::select(Year:Tair) %>% filter(Year != "YYYY")
dplyr<- Bugac %>% group_by(Hr) %>%
BugacSum summarize(meanRad = mean(SolarRad), meanTemp = mean(Tair),
sdRad = sd(SolarRad), sdTemp = sd(Tair))
<- BugacSum %>% ggplot(aes(x=Hr)) + scale_x_continuous(breaks=seq(0,24,3))
px <- px + geom_line(aes(y=meanRad), col="blue") +
p1 geom_errorbar(aes(ymax = meanRad + sdRad, ymin = meanRad - sdRad)) +
ggtitle("Bugac 2005 ensemble solar radiation")
<- px + geom_line(aes(y=meanTemp),col="red") +
p2 geom_errorbar(aes(ymax=meanTemp+sdTemp, ymin=meanTemp-sdTemp)) +
ggtitle("Bugac 2005 ensemble air temperature")
plot_grid(plotlist=list(p1,p2), ncol=1, align='v') # from cowplot
```

### 11.3.3 Julian Dates

Julian dates are the number of days since some start date

- The origin (Julian date 0) can be arbitrary
- Default for julian() is 1970-01-01

For climatological work where the year is very important, it usually means the number of days of any given year

- lubridate’s yday() gives you this.
- same thing as setting the origin for julian() as the 12-31 of the previous year.
- Useful in time series when the year is the time unit and observations are by days of the year (e.g. for climate data)

```
library(lubridate)
julian(ymd("2020-01-01"))
```

```
## [1] 18262
## attr(,"origin")
## [1] "1970-01-01"
```

`julian(ymd("1970-01-01"))`

```
## [1] 0
## attr(,"origin")
## [1] "1970-01-01"
```

`yday(ymd("2020-01-01"))`

`## [1] 1`

`julian(ymd("2020-01-01"), origin=ymd("2019-12-31"))`

```
## [1] 1
## attr(,"origin")
## [1] "2019-12-31"
```

To create a decimal yday including fractional days, here’s a function:

```
ydayDec <- function(d) {
yday(d)-1 + hour(d)/24 + minute(d)/1440 + second(d)/86400}
```

## 11.4 Learning more about time series in R

It can be confusing, partly because of the major use of time series in the financial sector, so a search gets you quickly into the monetizing world…

One good source is actually at CRAN: https://cran.r-project.org/web/views/TimeSeries.html

Time series data library: https://pkg.yangzhuoranyang.com/tsdl/

Understanding time series in R: https://a-little-book-of-r-for-time-series.readthedocs.io/

### References

*Wetlands*41 (3): 1–17.

*Earth Surface Processes and Landforms*26.

*European Fluxes Database*. n.d. http://www.icos-etc.eu/home.

*Introduction to Data Science: Data Analysis and Prediction Algorithms with r*. CRC Press. https://cran.r-project.org/package=dslabs.