# 14 Time Series Visualization & Analysis

Time series data are widely used in the financial sector for obvious reasons, and these methods can also be very useful for environmental data science especially in studies such as looking at long-term climate change variables such as the Mauna Loa CO_{2} data above, or at the much finer temporal scale, environmental data from data loggers such as eddy covariance flux towers.

(Blackburn, Oliphant, and Davis 2021)

The time series data type has been used in R for some time, and in fact a lot of the built-in data sets are time series, like the Mauna Loa CO_{2} data or the flow of the Nile:

`plot(Nile)`

So what’s the benefit of a time series data object? Why not just use a data frame with time stamps as a variable?

You could just work with these data as data frames, and often this is all you need. In this chapter, for instance, we won’t work exclusively with time series data objects; our data will just include variation over time. However there are advantages in R understanding how to deal with actual time series in a temporal dimension, including knowing what time unit they’re in especially if there are periodic fluctuations over that time period. This is similar to spatial data where it’s useful to maintain the geometry of our data and coordinate system they’re in. So we should learn how time series objects work and how they’re built, just as we did with spatial data.

## 14.1 Unit, frequency, seasonality and trend of time series

A time series can be assumed to have a **time unit** which we’ll create by assigning a **frequency** parameter, and from there comes the concept of “**seasonality**” where we want to consider values changing over the period of that time unit, like seasons of the year or “seasons” of the day or week or other time unit. There may also be a **trend** over time that we may be interested in knowing. Decomposing the time series will display all of these things, if the time series is set up right.

A good place to see the effect of seasonality is to look at the Mauna Loa CO_{2} data, which shows regular annual cycles, yet with a regularly increasing trend over the years. One time-series graphic, a **decomposition**, shows the original observations, followed by a trend line that removes the seasonal and local (short-term) random irregularities, a detrended *seasonal* picture which removes that trend to just show the seasonal cycles, followed by the random irregularities.

`plot(decompose(co2))`

In reading this graph, note the vertical scale: the units are all the same – parts per million – so the actual amplitude of the seasonal cycle should be the same as the annual amplitude of the observations. It’s just scaled to the chart height, which tends to exaggerate the seasonal cycles and random irregularities.

## 14.2 Creation of time series (ts) data

A time series (ts) can be created with the ts() function.

- the time unit can be anything
- observations must be a regularly spaced series
- time values are not normally included as a variable in the ts

We’ll build a simple time series with monthly high and low temperatures in San Francisco I just pulled off a Google search. We’ll start with just building a data frame, first converting the temperatures from Fahrenheit to Celsius.

```
library(tidyverse)
<- c(58,61,62,63,64,67,67,68,71,70,64,58)
SFhighF <- c(47,48,49,50,51,53,54,55,56,55,51,47)
SFlowF <- (SFhighF-32)*5/9
SFhighC <- (SFlowF-32)*5/9
SFlowC <- bind_cols(high=SFhighC,low=SFlowC)
SFtempC plot(ts(SFtempC), main="SF temperature, monthly time unit")
```

### 14.2.1 Frequency, start and end parameters for `ts()`

To convert this data frame into a time series, we’ll need to provide at least a frequency setting (the default frequency=1 was applied above), which sets how many observations per time unit. The ts() function doesn’t seem to care what the time unit is in reality, however some functions figure it out, at least for an annual time unit, e.g. that 1-12 means months when there’s a frequency of 12.

`plot(ts(SFtempC, frequency=12), main="yearly time unit")`

**frequency < 1**

You can have a frequency less than 1. For instance, for greenhouse gases (CO_{2}, CH_{4}, N_{2}O) captured from ice cores in Antarctica (Law Dome Ice Core, just south of Cape Poinsett, Antarctica (Irizarry (2019))), there are values for every 20 years, so if we want year to be the time unit, the frequency would need to be to be 1/20 years = 0.05.

```
library(dslabs)
data("greenhouse_gases")
<- pivot_wider(greenhouse_gases, names_from = gas,
GHGwide values_from = concentration)
<- ts(GHGwide, frequency=0.05, start=20)
GHG plot(GHG)
```

## 14.3 Associating times with time series

In the greenhouse gas data above, we not only specified frequency to define the time unit, we also provided a `start`

value of 20 to provide an actual year to start the data with. Time series don’t normally include time stamps, so we need to specify **start** and **end** parameters. This can either a single number (if it’s the first reading during the time unit) or a vector of two numbers (the second of which is an integer), which specify a natural time unit and a number of samples into the time unit (e.g. which month during the year, or which hour, minute or second during that day, etc.)

Example with year as the time unit and monthly data, starting July 2019 and ending June 2020:

`frequency=12, start=c(2019,7), end=c(2020,6)`

### 14.3.1 Subsetting time series by times

Time series can be queried for their `frequency`

, `start`

, and `end`

properties, and we’ll look at these for a built-in time series data set, `sunspot.month`

, which has monthly values of sunspot activity from 1749 to 2013.

`plot(sunspot.month)`

`frequency(sunspot.month)`

`## [1] 12`

`start(sunspot.month)`

`## [1] 1749 1`

What are the two values?

`end(sunspot.month)`

`## [1] 2013 9`

To subset a section of a time series, we can use the `window`

function.

`plot(window(sunspot.month, start = 1940, end = 1970))`

### 14.3.2 Changing the frequency to use a different period

We commonly use periods of days or years, but other options are possible. During the COVID-19 pandemic, weekly cycles of cases were commonly reported due to the weekly work cycle and weekend activities. Lunar tidal cycles are another.

Sunspot activity is known to fluctuate over about a 11 year cycle, and other than the well-known effects on radio transmission is also part of short-term climate fluctuations (so must be considered when looking at climate change). We can see the 11-year cycle in the above graphs, and we can perhaps see it more clearly in a decomposition where we set the period to the 11 year cycle.

To do this, we need to change the frequency. This is pretty easy to do, but to understand the process, first realize that we can take the original time series and create an identical copy to see how we can set the same parameters that already exist:

`<- ts(sunspot.month, frequency=12, start=c(1749,1)) sunspot `

To create one with an 11-year sunspot cycle would then require just setting an appropriate frequency:

`<- ts(sunspot.month, frequency=11*12, start=c(1749,1)) sunspotCycles `

But we might want to figure out when to start, so let’s window it for the first 20 years:

`plot(window(sunspot.month, end=c(1769,12)))`

Looks like a low point about 1755, so let’s start there:

`<- ts(window(sunspot.month, start=1755), frequency=11*12, start=1755) sunspotCycles `

Now this is assuming that the sunspot cycle is exactly 11 years, which is probably not right, but a look at the decomposition might be useful:

`plot(decompose(sunspotCycles))`

The seasonal picture makes things look simple, but it’s deceptive. The sunspot cycle varies significantly, and since we can see the random variation is much greater than the seasonal, this suggests that we might not be able to use this periodicity reliably. But it was worth a try, and a good demonstration of an unconventional type of period.

### 14.3.3 Time stamps and extensible time series

What we’ve been looking at so far are data in regularly spaced series where we just provide the date and time as when the series starts and ends. The data don’t include what we’d call *time stamps* as a variable, and these aren’t really needed if our data set is regularly spaced. However, there are situations where there may be missing data or imperfectly spaced data. Missing data for regularly spaced data can simply be entered as `NA`

, but imperfectly spaced data may need a time stamp. One way is with *extensible time series* which we’ll look at below, but first let’s look at a couple of `lubridate`

functions for working with days of the year and Julian dates, which can help with time stamps.

#### 14.3.3.1 Lubridate and Julian Dates

As we saw earlier, the `lubridate`

package makes dealing with dates a lot easier, with many routines for reading and manipulating dates in variables. *If you didn’t really get a good understanding of using dates and times with lubridate, you may want to review that section earlier in the book. Dates and times can be difficult to work with, so you’ll need to make sure you know the best tools, and lubridate has the best tools.*

One variation on dates it handles well are *Julian dates*, the number of days since some start date

- The origin (Julian date 0) can be arbitrary
- The default for julian() is 1970-01-01

For climatological work where the year is very important, we usually want the number of days of any given year.

- lubridate’s
`yday()`

gives you this. - same thing as setting the origin for
`julian()`

as the 12-31 of the previous year. - Useful in time series when the year is the time unit and observations are by days of the year (e.g. for climate data)

```
library(lubridate)
julian(ymd("2020-01-01"))
```

```
## [1] 18262
## attr(,"origin")
## [1] "1970-01-01"
```

`julian(ymd("1970-01-01"))`

```
## [1] 0
## attr(,"origin")
## [1] "1970-01-01"
```

`yday(ymd("2020-01-01"))`

`## [1] 1`

`julian(ymd("2020-01-01"), origin=ymd("2019-12-31"))`

```
## [1] 1
## attr(,"origin")
## [1] "2019-12-31"
```

To create a decimal `yday`

including fractional days, here’s a function which simply uses the number of hours in a day, minutes in a day, and seconds in a day.

```
<- function(d) {
ydayDec yday(d)-1 + hour(d)/24 + minute(d)/1440 + second(d)/86400}
<- now()
datetime print(datetime)
```

`## [1] "2022-01-15 15:22:54 PST"`

`print(ydayDec(datetime), digits=12)`

`## [1] 14.6409027778`

#### 14.3.3.2 Extensible time series (`xts`

)

We’ll look at an example where we need to provide time stamps. Weekly water quality data (*E. coli*, total coliform, and *Enterococcus* bacterial counts) were downloaded from San Mateo County Health Department for San Pedro Creek, collected weekly since 2012, usually on Monday but sometimes on Tuesday, but with the date recorded in the variable `SAMPLE_DATE`

.

First, we’ll just look at this with the data frame, and plot by `SAMPLE_DATE`

:

```
library(igisci)
library(tidyverse)
<- read_csv(ex("SanPedro/SanPedroCreekBacterialCounts.csv")) %>%
SanPedroCounts filter(!is.na(`Total Coliform`)) %>%
rename(Total_Coliform = `Total Coliform`, Ecoli = `E. Coli`) %>%
::select(SAMPLE_DATE,Total_Coliform,Ecoli) dplyr
```

To create a time series from the San Pedro Creek *E. coli* data, we need to use a package xts (Extensible Time Series), using `SAMPLE_DATE`

to order the data.

```
library(xts); library(forecast)
<- xts(SanPedroCounts$Ecoli, order.by = SanPedroCounts$SAMPLE_DATE)
Ecoli attr(Ecoli, 'frequency') <- 52
plot(as.ts(Ecoli))
```

`plot(decompose(as.ts(Ecoli)))`

What we can see from the above decomposition is no real signal from the annual cycle (52 weeks), since while there is some tendency for peak levels (“first-flush” events) happening late in the calendar year, the timing of those first heavy rains varies significantly from year to year.

## 14.4 Data smoothing: moving average (`ma`

)

The great amount of fluctuation in bacterial counts we see above makes it difficult to interpret the significant patterns. A moving average is a simple way to generalize data, using an order parameter to define the size of the moving window, which should be an odd number.

To see how it works, let’s apply a `ma()`

to the SF data we created a few steps back. This isn’t data we need to smooth, but we can see how the function works by comparing the original and averaged data.

```
library(forecast)
ts(SFtempC, frequency=12)
```

```
## high low
## Jan 1 14.44444 8.333333
## Feb 1 16.11111 8.888889
## Mar 1 16.66667 9.444444
## Apr 1 17.22222 10.000000
## May 1 17.77778 10.555556
## Jun 1 19.44444 11.666667
## Jul 1 19.44444 12.222222
## Aug 1 20.00000 12.777778
## Sep 1 21.66667 13.333333
## Oct 1 21.11111 12.777778
## Nov 1 17.77778 10.555556
## Dec 1 14.44444 8.333333
```

`ma(ts(SFtempC,frequency=12),order=3)`

```
## [,1] [,2]
## Jan 1 NA NA
## Feb 1 15.74074 8.888889
## Mar 1 16.66667 9.444444
## Apr 1 17.22222 10.000000
## May 1 18.14815 10.740741
## Jun 1 18.88889 11.481481
## Jul 1 19.62963 12.222222
## Aug 1 20.37037 12.777778
## Sep 1 20.92593 12.962963
## Oct 1 20.18519 12.222222
## Nov 1 17.77778 10.555556
## Dec 1 NA NA
```

A moving average can also be applied to a vector or a variable in a data frame – doesn’t have to be a time series. For the water quality data, we’ll use a large order (15) to best clarify the significant pattern of *E. coli* counts:

`ggplot(data=SanPedroCounts) + geom_line(aes(x=SAMPLE_DATE, y=ma(Ecoli, order=15)))`

**Moving average of CO _{2} data**

We’ll look at another application of a moving average looking at CO_{2} data from Antarctic ice cores (Irizarry (2019)).

```
library(dslabs)
data("greenhouse_gases")
<- pivot_wider(greenhouse_gases,
GHGwide names_from = gas,
values_from = concentration)
<- ts(GHGwide$CO2,
CO2 frequency = 0.05)
plot(CO2)
```

```
library(forecast)
<- ma(CO2, order=7)
CO2ma plot(CO2ma)
```

For periodic data we can decompose time series into seasonal and trend signals, with the remainder left as random variation. But even for non-periodic data, we can consider random variation by simply subtracting the major signal of the moving average from the original data:

`plot(CO2-CO2ma)`

## 14.5 Seasonal decomposition of Time series with Loess `stl`

Another type of decomposition involves local regression or “loess” by using the tool ** stl** (

*seasonal decomposition of time series using loess*) If

`s.window = "periodic"`

the mean is used for smoothing: the seasonal values are removed and the remainder smoothed to find the trend.`plot(stl(co2, s.window="periodic"))`

As we saw above with the decomposition, the *E. coli* data doesn’t have a clear annual cycle; the same is apparent with the `stl`

result here. The trend may however be telling us something about the general trend of bacterial counts over time.

`plot(stl(Ecoli, s.window="periodic"))`

Maybe we can use a moving average to try to see a clearer pattern? We’ll need to use `tseries::na.remove`

to remove the `NA`

at the beginning and end of the data, and I’d probably recommend making sure we’re not creating any time errors doing this.

```
<- tseries::na.remove(as.ts(ma(Ecoli, order=15)))
ecoli15 plot(stl(ecoli15, s.window="periodic"))
```

For more on loess, see:

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/loess

Another local fitting method employs a polynomial model. See:

## 14.6 Decomposition of data logger data: Marble Mountains

Data loggers can create a rich time series data set, with readings repeated at very short time frames (depending on the capabilities of the sensors and data logging system) and extending as far as the instrumentation allows. For a study of chemical water quality and hydrology of a karst system in the Marble Mountains of California, a spring *resurgence* was instrumented to measure water level, temperature, and specific conductance (a surrogate for total dissolved solids) over a 4-year period (J. D. Davis and Davis 2001). The frequency was quite coarse – one sample every 2 hours – but still provides a useful data set to consider some of the time series methods we’ve been exploring.

You might want to start by just reading the data, and having a look at it.

`read_csv(ex("marbles/resurgenceData.csv"))`

```
## # A tibble: 17,537 x 9
## date time wlevelm WTemp ATemp EC InputV BTemp precip_mm
## <chr> <time> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 9/23/1994 14:00 0.325 5.69 24.7 204. 12.5 24.8 NA
## 2 9/23/1994 16:00 0.325 5.69 20.4 207. 12.5 18.7 NA
## 3 9/23/1994 18:00 0.322 5.64 16.6 207. 12.5 16.3 NA
## 4 9/23/1994 20:00 0.324 5.62 16.1 207. 12.5 14.9 NA
## 5 9/23/1994 22:00 0.324 5.65 15.6 207. 12.5 14.0 NA
## 6 9/24/1994 00:00 0.323 5.66 14.4 207. 12.5 13.3 NA
## 7 9/24/1994 02:00 0.323 5.63 13.9 207. 12.5 12.8 NA
## 8 9/24/1994 04:00 0.322 5.65 13.1 207. 12.5 12.3 NA
## 9 9/24/1994 06:00 0.319 5.61 11.8 207. 12.5 11.9 NA
## 10 9/24/1994 08:00 0.321 5.66 13.1 207. 12.4 11.6 NA
## # ... with 17,527 more rows
```

What we can see is that the date_time stamps show that measurements are every 2 hours. This will help us set the frequency for an annual period. The start time is `1994-09-23 14:00:00`

, so we need to provide that information as well. Note the computations used to do this (I first worked out that the `yday`

for 9/23 was 266):

```
<- read_csv(ex("marbles/resurgenceData.csv")) %>%
resurg mutate(date_time = mdy_hms(paste(date, time))) %>%
::select(ATemp, BTemp, wlevelm, EC, InputV) # removed date_time,
dplyr<- ts(resurg, frequency = 12*365, start = c(1994, 266*12+7))
resurgTS plot(resurgTS)
```

In this study, water level water level has an annual seasonality that might be best understood by decomposition.

```
library(tidyverse); library(lubridate)
<- read_csv(ex("marbles/resurgenceData.csv")) %>%
resurg mutate(date_time = mdy_hms(paste(date, time))) %>%
::select(date_time, ATemp, BTemp, wlevelm, EC, InputV)
dplyr<- ts(resurg$wlevelm, frequency = 12*365,
wlevelm start = c(1994, 266*12+7))
<- stl(wlevelm, s.window="periodic")
fit plot(fit)
```

## 14.7 Lag regression

Lag regression is like a normal linear model, but there’s a timing difference, or lag, between the explanatory and response variables. We have used this method to compare changes in snow melt in a Sierra Nevada watershed and changes in reservoir inflows downstream (Powell et al. (2011)).

Another good application of lag regression is comparing solar radiation and temperature. We’ll explore solar radiation and temperature data over an 8-day period for Bugac, Hungary, and Manaus, Brazil, maintained by the European Fluxes Database (*European Fluxes Database*, n.d.). These data are every half hour, but let’s have a look. You could navigate to the spreadsheet and open it in Excel, and I would recommend it – you should know where the extdata are stored on your computer – since this is the easiest way to see that there are two worksheets. With that worksheet naming knowledge, we can also have a quick look in R:

```
library(readxl)
read_xls(ex("ts/SolarRad_Temp.xls"), sheet="BugacHungary")
```

```
## # A tibble: 17,521 x 7
## Year `Day of Yr` Hr SolarRad Tair ...6 ...7
## <chr> <chr> <chr> <chr> <chr> <lgl> <chr>
## 1 YYYY DOY 0-23.5 W m^-2 deg C NA Site details
## 2 2006 1 0.5 0 0.55000001192092896 NA http://www.icos-~
## 3 2006 1 1 0 0.56999999284744263 NA <NA>
## 4 2006 1 1.5 0 0.82999998331069946 NA <NA>
## 5 2006 1 2 0 1.0399999618530273 NA <NA>
## 6 2006 1 2.5 0 1.1699999570846558 NA <NA>
## 7 2006 1 3 0 1.2000000476837158 NA <NA>
## 8 2006 1 3.5 0 1.2300000190734863 NA <NA>
## 9 2006 1 4 0 1.2599999904632568 NA <NA>
## 10 2006 1 4.5 0 1.2599999904632568 NA <NA>
## # ... with 17,511 more rows
```

Indeed the readings are every half hour, so for a daily period, we should use a frequency of **48**. We need to remove the second line of data that has the units of the measurements which we’ll do by filtering out based on one of them (`Year != "YYYY"`

). And also there’s some stuff in the columns past `Tair`

that we don’t want, so a `dplyr::select`

is in order.

```
library(readxl); library(tidyverse)
<- read_xls(system.file("extdata", "ts/SolarRad_Temp.xls", package="igisci"),
BugacSolstice sheet="BugacHungary", col_types = "numeric") %>%
filter(Year != "YYYY" & `Day of Yr` < 177 & `Day of Yr` > 168) %>%
::select(SolarRad, Tair)
dplyr<- ts(BugacSolstice, frequency = 48)
BugacSolsticeTS plot(BugacSolstice, main="a simple scatter plot that illustrates hysteresis")
```

**Side note**: *Hysteresis* is “the dependence of a state of a system on its history” (“Hysteresis” (n.d.)), and one place this can be seen is in periodic data, when an explanatory variable’s effect on a response variable differs on an increasing limb from what it produces on a decreasing limb. Two classic situations where this occur are in (a) water quality, where the rising limb of the hydrograph will carry more sediment load than a falling limb; and in (b) the daily or seasonal pattern of temperature in response to solar radiation, which we can see in the graph above with two apparent groups of scatterpoints representing the varying impact of solar radiation in spring vs fall, where temperatures lag behind the increase in solar radiation.

`plot(BugacSolsticeTS)`

### 14.7.1 The lag regression, using a lag function in a linear model

The lag function uses a lag setting of the number of units of observations to model against. so we compare air temperature `Tair`

to a lagged solar radiation `SolarRad`

as `lm(Tair~lag(SolarRad,i)`

where i is the number of units to lag. The following code loops through 6 values (0:5) for i, starting with assigning the `RMSE`

(root mean squared error) for no lag (0), then appending to `RMSE`

for `i`

of `1:5`

to hopefully hit the minimum value.

```
<- function(i) {sqrt(sum(lm(Tair~lag(SolarRad,i), data=BugacSolsticeTS)$resid**2))}
getRMSE <- getRMSE(0)
RMSE for(i in 1:5){RMSE <- append(RMSE,getRMSE(i))}
<- bind_cols(lags = 0:5,RMSE = RMSE)
lagTable ::kable(lagTable) knitr
```

lags | RMSE |
---|---|

0 | 64.01352 |

1 | 55.89803 |

2 | 50.49347 |

3 | 48.13214 |

4 | 49.64892 |

5 | 54.36344 |

Minimum RMSE is for a lag of 3, so 1.5 hours. Let’s go ahead and use that minimum RMSE lag and look at the summarized model.

```
<- lagTable$lags[which.min(lagTable$RMSE)]
n paste("Lag: ",n)
```

`## [1] "Lag: 3"`

`summary(lm(Tair~lag(SolarRad,n), data=BugacSolsticeTS))`

```
##
## Call:
## lm(formula = Tair ~ lag(SolarRad, n), data = BugacSolsticeTS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5820 -1.7083 -0.2266 1.8972 7.9336
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.352006 0.174998 104.87 <2e-16 ***
## lag(SolarRad, n) 0.016398 0.000386 42.48 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 379 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.8265, Adjusted R-squared: 0.826
## F-statistic: 1805 on 1 and 379 DF, p-value: < 2.2e-16
```

So, not surprisingly, air temperature has a significant relationship to solar radiation. Confirmatory statistics should never be surprising, but just give us confidence in our conclusions. What we didn’t know before was just what amount of lag produces the strongest relationship, with the least error (root mean squared error, or RMSE), which turns out to be 3 observations or 1.5 hours. Interestingly, the *peak* solar radiation and temperature are separated by 5 units, so 2.5 hours, which we can see by looking at the generalized seasonals from decomposition.

```
<- decompose(ts(BugacSolstice$SolarRad, frequency = 48))
SolarRad_comp <- decompose(ts(BugacSolstice$Tair, frequency = 48))
Tair_comp <- bind_cols(Rad = SolarRad_comp$seasonal, Temp = Tair_comp$seasonal) seasonals
```

```
ggplot(seasonals) +
geom_line(aes(x=seq_along(Temp), y=Temp), col="black",size=1.5) +
geom_line(aes(x=seq_along(Rad), y=Rad/50), col="gray", size=1) +
scale_x_continuous(breaks = seq(0,480,12))
```

`which.max(seasonals$Rad)`

`## [1] 23`

`which.max(seasonals$Temp)`

`## [1] 28`

There’s a similar lag for Manaus, though the peak solar radiation and temperatures coincide.

## 14.8 Ensemble summary statistics

For any time series data with meaningful time units like days or years (or even weeks when the data varies by days of the week – this has been the case with COVID-19 data), ensemble averages (along with other summary statistics) are a good way to visualize changes in a variable over that time unit.

As you could see from the decomposition and sequential graphics we created for the Bugac and Manaus solar radiation and temperature data, looking at what happens over one day as an “ensemble” of all of the days – where the mean value is displayed along with error bars based on standard deviations of the distribution – might be a useful figure.

We’ve used `group_by`

-`summarize`

elsewhere in this book and this is yet another use of that handy method. Note that the grouping variable `Hr`

is not categorical, not a factor, but simply a numeric variable of the hour as a decimal number – so 0.0, 0.5, 1.0, etc. – representing half-hourly sampling times, so since these repeat each day so can be used as a grouping variable.

```
library(cowplot)
<- read_xls(system.file("extdata","ts/SolarRad_Temp.xls",package="igisci"),
Manaus sheet="ManausBrazil", col_types = "numeric") %>%
::select(Year:Tair) %>% filter(Year != "YYYY")
dplyr<- Manaus %>% group_by(Hr) %>%
ManausSum summarize(meanRad = mean(SolarRad), meanTemp = mean(Tair),
sdRad = sd(SolarRad), sdTemp = sd(Tair))
<- ManausSum %>% ggplot(aes(x=Hr)) + scale_x_continuous(breaks=seq(0,24,3))
px <- px + geom_line(aes(y=meanRad), col="blue") +
p1 geom_errorbar(aes(ymax = meanRad + sdRad, ymin = meanRad - sdRad)) +
ggtitle("Manaus 2005 ensemble solar radiation")
<- px + geom_line(aes(y=meanTemp),col="red") +
p2 geom_errorbar(aes(ymax=meanTemp+sdTemp, ymin=meanTemp-sdTemp)) +
ggtitle("Manaus 2005 ensemble air temperature")
plot_grid(plotlist=list(p1,p2), ncol=1, align='v') # from cowplot
```

## 14.9 Learning more about time series in R

There has been a lot of work on time series in R, especially in the financial sector where time series are important for forecasting models. Some of these methods should have applications in environmental research. Here are some resources for learning more:

- At CRAN: https://cran.r-project.org/web/views/TimeSeries.html
- The time series data library: https://pkg.yangzhuoranyang.com/tsdl/
- https://a-little-book-of-r-for-time-series.readthedocs.io/

For an exploration of flux tower data from a study of Loney Meadow in the Sierra Nevada (Blackburn, Oliphant, and Davis (2021)), see Appendix A12.1.

## 14.10 Exercises

- Create an half-hourly ensemble plot of Bugac, Hungary, similar to the one we created for Manaus, Brazil.

Create a ManausJan tibble for just the January data, then decompose a ts of

`ManausJan$SolarRad`

, and plot it. Consider carefully what frequency to use to get a daily time unit. Also, remember that “seasonal” in this case means “diurnal.” Code and plot:Then do the same for air temperature (

`Tair`

) for Manaus in January, providing code and plot. What observations can you make from this on conditions in Manaus in January.Now do both of the above for Bugac, Hungary, and reflect on what it tells you, and how these places are distinct (other than the obvious).

Now do the same for June. You’ll probably need to use the

`yday`

and`ymd`

(or similar) functions from`lubridate`

to find the date range for 2005 or 2006 (neither are leap years, so they should be the same).Do a lag regression over an 8-day period over the March equinox (about 3/21), when the sun’s rays should be close to vertical since it’s near the Equator, but it’s very likely to be cloudy at the Intertropical Convergence Zone. As we did with the Bugac data, work out the best lag value using the minimum-RMSE model process we used for Bugac, and produce a summary of that model.

Create an ensemble plot of monthly air and box temperature from the Marble Mountains Resurgence data. The air temperature was measured from a thermistor mounted near the ground but outside the box, and the box temperature was measured within the buried box (see the pictures to visualize this.)

How about hourly? The Marbles resurgence data was collected every 2 hours. Interpret what you see here, and what the air vs box temperature patterns might make sense considering where the variance comes from.

Tree-ring data: Using

`forecast::ma`

, find a useful order value for a curve that communicates a pattern of tree rings over the last 8000 years, with high values around -3500 and 0 years. Note: the order may be pretty big, and you might want to loop through a sequence of values to see what works. I found that creating a sequence in a for loop with seq(from=100,to=1000,by100) useful, and any further created some odd effects.Sunspot activity is known to fluctuate over about a 11 year cycle, and other than the well-known effects on radio transmission is also part of short-term climate fluctuations (so must be considered when looking at climate change). The built-in data set sunspot.month has monthly values of sunspot activity from 1749 to 2014. Use the frequency function to understand its time unit, then create an alternative ts with an 11 year cycle as frequency (how do you define its frequency?) and plot a decomposition and stl.

### References

*Wetlands*41 (3): 1–17.

*Earth Surface Processes and Landforms*26.

*European Fluxes Database*. n.d. http://www.icos-etc.eu/home.

*Introduction to Data Science: Data Analysis and Prediction Algorithms with r*. CRC Press. https://cran.r-project.org/package=dslabs.

*Global and Planetary Change*77 (1-2): 77–84.