4 Validation

4.1 Leave-one-out cross validation

Plot correlation between mean annual temperature (MAT) and lag between green-up/down frequency and pollen count. * A positive lag means leafing phenology leads pollen phenology; a negative lag means leafing phenology lags pollen phenology.

source("code/clim_plot.R")
p_lag_clim

* At warmer places, oak pollen tend to percede 50% green-up and vice versa.

Linear regression to check significance of the correlation. When looking at individual taxa, only Quercus was statistically significant.

source("code/clim_reg.R")
p_slope

We conducted leave-one-out cross validation to test the robustness of the climate-phenology relationship and the effectiveness of using it to infer flowering phenology in new locations. Specifically, we removed a random city from the pollen dataset at a time, matched leafing and pollen phenology in the other cities, and modeled the climate-lag correlation. We predicted the leafing-phenology lag with the linear model and subsequently predicted the flowering phenology from known leafing phenology at the city held for validation. We evaluated the accuracy of our methods by calculating the RMSE between the predicted flowering phenology and standardized pollen count observations at the cities held for validation.

source("code/loocv_tune.R")

4.2 Benchmarking

Predict pollen phenology with climatology (site-specific long-term mean), tested for accuracy only in-sample.

source("code/city_clim.R")

Predict pollen phenology with Gaussian methods, tested for accuracy in-sample and out-of-sample.

source("code/city_gaus.R")
source("code/city_gaus_cv.R")

Read in validation results and visualize.

source("code/valid_fit.R")

df_fit_all %>%
  drop_na(nrmse) %>%
  group_by(method) %>%
  summarise(
    median = median(nrmse),
    mean = mean(nrmse),
    lower = quantile(nrmse, 0.025),
    upper = quantile(nrmse, 0.975),
    n = n()
  )
## # A tibble: 5 × 6
##   method                   median          mean  lower upper     n
##   <fct>                     <dbl>         <dbl>  <dbl> <dbl> <int>
## 1 in-sample (Gaussian)      0.135 302790309.    0.0783 0.340   299
## 2 in-sample                 0.142         0.159 0.0881 0.336   282
## 3 in-sample (climatology)   0.126         0.201 0.0549 0.941   299
## 4 out-of-sample (Gaussian)  0.156         0.185 0.111  0.416    33
## 5 out-of-sample             0.145         0.173 0.0982 0.360    33
df_fit_all %>%
  drop_na(nrmse) %>%
  filter(method == "in-sample") %>%
  group_by(taxa) %>%
  summarise(
    median = median(nrmse),
    mean = mean(nrmse),
    lower = quantile(nrmse, 0.025),
    upper = quantile(nrmse, 0.975),
    n = n()
  ) %>%
  arrange(desc(median))
## # A tibble: 15 × 6
##    taxa        median  mean  lower upper     n
##    <chr>        <dbl> <dbl>  <dbl> <dbl> <int>
##  1 Alnus        0.190 0.172 0.107  0.239    13
##  2 Acer         0.189 0.214 0.100  0.408    23
##  3 Ulmus early  0.172 0.203 0.111  0.435    28
##  4 Betula       0.157 0.154 0.0971 0.207    23
##  5 Salix        0.156 0.156 0.0947 0.242    19
##  6 Morus        0.144 0.148 0.0832 0.244    22
##  7 Platanus     0.142 0.147 0.0853 0.241    18
##  8 Populus      0.141 0.142 0.0972 0.215    23
##  9 Fraxinus     0.140 0.149 0.0912 0.248    20
## 10 Juglans      0.140 0.143 0.109  0.204    15
## 11 Ulmus late   0.136 0.148 0.110  0.204    15
## 12 Quercus      0.135 0.154 0.0967 0.321    33
## 13 Carya        0.130 0.135 0.104  0.174    15
## 14 Celtis       0.124 0.130 0.0993 0.184    10
## 15 Liquidambar  0.110 0.110 0.0868 0.127     5
p_taxa_nrmse

df_fit_all %>%
  drop_na(spearman) %>%
  group_by(method) %>%
  summarise(
    median = median(spearman),
    mean = mean(spearman),
    lower = quantile(spearman, 0.025),
    upper = quantile(spearman, 0.975),
    n = n()
  )
## # A tibble: 5 × 6
##   method                   median  mean lower upper     n
##   <fct>                     <dbl> <dbl> <dbl> <dbl> <int>
## 1 in-sample (Gaussian)      0.598 0.580 0.205 0.864   299
## 2 in-sample                 0.567 0.544 0.125 0.845   282
## 3 in-sample (climatology)   0.669 0.645 0.250 0.981   299
## 4 out-of-sample (Gaussian)  0.755 0.702 0.346 0.910    33
## 5 out-of-sample             0.691 0.679 0.337 0.910    33
df_fit_all %>%
  drop_na(spearman) %>%
  filter(method == "in-sample") %>%
  group_by(taxa) %>%
  summarise(
    median = median(spearman),
    mean = mean(spearman),
    lower = quantile(spearman, 0.025),
    upper = quantile(spearman, 0.975),
    n = n()
  ) %>%
  arrange(desc(median))
## # A tibble: 15 × 6
##    taxa        median  mean   lower upper     n
##    <chr>        <dbl> <dbl>   <dbl> <dbl> <int>
##  1 Quercus      0.751 0.733  0.491  0.918    33
##  2 Populus      0.659 0.623  0.220  0.827    23
##  3 Morus        0.610 0.600  0.432  0.784    22
##  4 Celtis       0.599 0.580  0.280  0.761    10
##  5 Ulmus early  0.590 0.542  0.145  0.834    28
##  6 Carya        0.575 0.590  0.421  0.750    15
##  7 Betula       0.557 0.537  0.127  0.814    23
##  8 Fraxinus     0.544 0.534  0.142  0.825    20
##  9 Ulmus late   0.544 0.477  0.138  0.678    15
## 10 Liquidambar  0.542 0.492  0.241  0.627     5
## 11 Acer         0.514 0.479  0.131  0.736    23
## 12 Alnus        0.500 0.427 -0.0246 0.785    13
## 13 Salix        0.468 0.430  0.0784 0.708    19
## 14 Platanus     0.460 0.460  0.125  0.719    18
## 15 Juglans      0.454 0.423  0.260  0.531    15
p_taxa_spearman

Validate with NPN instead of NAB and visualize.

df_fit_all %>%
  drop_na(spearman_npn) %>%
  group_by(method) %>%
  summarise(
    median = median(spearman_npn),
    mean = mean(spearman_npn),
    lower = quantile(spearman_npn, 0.025),
    upper = quantile(spearman_npn, 0.975),
    n = n()
  )
## # A tibble: 2 × 6
##   method        median  mean   lower upper     n
##   <fct>          <dbl> <dbl>   <dbl> <dbl> <int>
## 1 in-sample      0.479 0.398 -0.297  0.795   130
## 2 out-of-sample  0.470 0.476  0.0214 0.788    27
df_fit_all %>%
  drop_na(spearman_npn) %>%
  filter(method == "in-sample") %>%
  group_by(taxa) %>%
  summarise(
    median = median(spearman_npn),
    mean = mean(spearman_npn),
    lower = quantile(spearman_npn, 0.025),
    upper = quantile(spearman_npn, 0.975),
    n = n()
  ) %>%
  arrange(desc(median))
## # A tibble: 12 × 6
##    taxa        median  mean      lower upper     n
##    <chr>        <dbl> <dbl>      <dbl> <dbl> <int>
##  1 Acer         0.583 0.515 -0.0364    0.809    20
##  2 Juglans      0.576 0.524  0.195     0.671     6
##  3 Carya        0.556 0.319 -0.291     0.714    10
##  4 Quercus      0.542 0.503  0.0000180 0.820    27
##  5 Betula       0.515 0.367 -0.305     0.759    14
##  6 Celtis       0.444 0.444  0.306     0.582     2
##  7 Populus      0.387 0.415  0.156     0.677    14
##  8 Ulmus early  0.368 0.249 -0.601     0.790    18
##  9 Alnus        0.352 0.398  0.300     0.576     4
## 10 Platanus     0.276 0.207 -0.232     0.489     8
## 11 Fraxinus     0.246 0.232 -0.0995    0.550     3
## 12 Salix        0.242 0.300  0.0371    0.664     4
p_taxa_spearman_npn