6 Empirical models
6.1 Summary stats for train and test
As the model proposed here are purely empirical, it is of great importance to define the range for which they are applicable. The tables below present the summary statistics of each retrieved optically active constituents for the train and test datasets.
It also worth to note that as the modeled relationships depend on a variety of complex intricate cumulative effects (i.e. specifics IOPs), the time range of the measurement are also of importance as one cannot assume the OACs concentrations and distributions to remain constant.
Global summary | ||||
---|---|---|---|---|
Characteristic | N | Overall, N = 3641 | EGSL, N = 2031 | JB, N = 1611 |
matchup | 364 | 58 (16%) | 58 (29%) | 0 (0%) |
SPM | 347 | 6 (3, 10) 1 - 110 | 7 (5, 10) 1 - 38 | 4 (2, 8) 1 - 110 |
PIM | 337 | 5 (3, 8) 0 - 101 | 6 (4, 8) 0 - 35 | 4 (2, 8) 1 - 101 |
POM | 177 | 1.54 (1.16, 1.94) 0.55 - 4.07 | 1.54 (1.16, 1.94) 0.55 - 4.07 | NA (NA, NA) NA - NA |
Ag_440 | 343 | 1.58 (1.02, 2.36) 0.16 - 11.50 | 1.08 (0.52, 1.76) 0.16 - 5.02 | 1.89 (1.55, 3.37) 0.94 - 11.50 |
Ag_295 | 343 | 16 (9, 23) 2 - 100 | 10 (5, 17) 2 - 47 | 20 (17, 33) 12 - 100 |
Ag_275 | 343 | 23 (13, 31) 3 - 128 | 14 (7, 23) 3 - 66 | 28 (24, 44) 17 - 128 |
Bbp_532 | 276 | 0.03 (0.01, 0.06) 0.00 - 0.46 | 0.01 (0.01, 0.02) 0.00 - 0.10 | 0.05 (0.03, 0.10) 0.01 - 0.46 |
set | 364 | |||
test | 112 (31%) | 83 (41%) | 29 (18%) | |
train | 252 (69%) | 120 (59%) | 132 (82%) | |
1
n (%); Median (IQR) Range
|
6.1.1 By Region
JB summary | |||
---|---|---|---|
Characteristic | N | test, N = 291 | train, N = 1321 |
SPM | 161 | 3 (2, 7) 1 - 38 | 5 (3, 9) 1 - 110 |
PIM | 160 | 2 (2, 6) 1 - 32 | 4 (3, 8) 1 - 101 |
POM | 0 | NA (NA, NA) NA - NA | NA (NA, NA) NA - NA |
Ag_440 | 155 | 1.94 (1.42, 3.38) 1.24 - 7.63 | 1.89 (1.59, 3.35) 0.94 - 11.50 |
Ag_295 | 155 | 20 (15, 33) 14 - 70 | 21 (18, 33) 12 - 100 |
Ag_275 | 155 | 27 (21, 44) 20 - 91 | 28 (24, 44) 17 - 128 |
Bbp_532 | 144 | 0.04 (0.03, 0.10) 0.01 - 0.37 | 0.05 (0.03, 0.09) 0.01 - 0.46 |
1
Median (IQR) Range
|
EGSL summary | |||
---|---|---|---|
Characteristic | N | test, N = 831 | train, N = 1201 |
SPM | 186 | 6.4 (4.3, 9.1) 1.2 - 38.3 | 7.6 (5.7, 10.6) 1.6 - 34.6 |
PIM | 177 | 5.3 (3.6, 7.9) 0.4 - 34.9 | 5.9 (4.3, 8.8) 0.9 - 32.3 |
POM | 177 | 1.46 (1.15, 1.76) 0.75 - 4.07 | 1.65 (1.23, 2.09) 0.55 - 3.75 |
Ag_440 | 188 | 1.29 (0.64, 2.01) 0.20 - 5.02 | 1.03 (0.48, 1.63) 0.16 - 4.30 |
Ag_295 | 188 | 12 (6, 21) 3 - 45 | 10 (5, 15) 2 - 47 |
Ag_275 | 188 | 16 (8, 29) 4 - 61 | 13 (7, 20) 3 - 66 |
Bbp_532 | 132 | 0.017 (0.012, 0.024) 0.004 - 0.099 | 0.012 (0.007, 0.022) 0.003 - 0.061 |
1
Median (IQR) Range
|
6.2 Ag
Regionality test | |||
---|---|---|---|
Characteristic | Beta | 95% CI1 | p-value |
log10(`559.8`/`664.6`) | -1.3 | -1.4, -1.2 | <0.001 |
Region | |||
EGSL | — | — | |
JB | -0.01 | -0.06, 0.04 | 0.83 |
log10(`559.8`/`664.6`) * Region | |||
log10(`559.8`/`664.6`) * JB | 0.03 | -0.16, 0.23 | 0.72 |
1
CI = Confidence Interval
|
The Region parameter is not significant here, in fact, no real difference of trend are observed between the two regions.
Regionality test | |||
---|---|---|---|
Characteristic | Beta | 95% CI1 | p-value |
I(`664.6`/`559.8`) | 2.3 | 1.8, 2.8 | <0.001 |
Region | |||
EGSL | — | — | |
JB | -0.78 | -1.3, -0.29 | 0.002 |
I(`664.6`/`559.8`) * Region | |||
I(`664.6`/`559.8`) * JB | 1.0 | 0.40, 1.6 | 0.001 |
1
CI = Confidence Interval
|
In linear space however, with the red/green ratio a small difference lead to a significant p value …
6.2.1 Ag(440)
Several band ratio model are presented hereafter.
The first one is fitted with a blue / red ratio. The blue part of the visible spectrum is the most influenced by CDOM absorption and it is expected to better grasp the variability according to the theory. In fact the distribution of \(a_g(440)\) with \(B(blue)/B(red)\) ratio is very sharp. It show a steep slope, likely induced by the saturation of light absorbed by CDOM as succinctly observed in 4.3 (see Figure 4.7) . There is a small constant offset between the measured and fitted value, the addition of a constant offset could then improve the model.
Characteristic | OLI | MSI | OLCI | ||||||
---|---|---|---|---|---|---|---|---|---|
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
a | 0.88 | 0.78, 1.0 | <0.001 | 0.90 | 0.80, 1.0 | <0.001 | 1.2 | 1.1, 1.3 | <0.001 |
b | -0.58 | -0.63, -0.53 | <0.001 | -0.57 | -0.61, -0.52 | <0.001 | -0.81 | -0.88, -0.74 | <0.001 |
1
CI = Confidence Interval
|
Characteristic | OLI | MSI | OLCI | ||||||
---|---|---|---|---|---|---|---|---|---|
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | 1.1 | 0.91, 1.2 | <0.001 | 1.1 | 1.0, 1.3 | <0.001 | 0.39 | 0.21, 0.57 | <0.001 |
I(`655`/`443`) | 0.17 | 0.15, 0.20 | <0.001 | ||||||
I(`664.6`/`442.7`) | 0.17 | 0.15, 0.19 | <0.001 | ||||||
I(`665`/`442.5`) | 0.82 | 0.74, 0.89 | <0.001 | ||||||
1
CI = Confidence Interval
|
The addition of a constant offset indeed slightly improve the model, as shown by performance metrics. However it is not clear to me, how this is justified …
Sensor | NRMSE | Rsq |
---|---|---|
OLI | 1.23 | 0.76 |
MSI | 1.23 | 0.76 |
OLCI | 1.25 | 0.76 |
The last one take advantage of of a green(edge) / red ratio, less affected by blue(ish) atmospheric correction errors and which give fairly good results. The slope of this distribution is less steep, confirming the saturation of light absorption by CDOM in the blue.
As this model present the least independents parameters and the best performance metrics it is chosen to be applied on Sensor images.
Characteristic | OLI | MSI | OLCI | ||||||
---|---|---|---|---|---|---|---|---|---|
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
a | 2.2 | 2.1, 2.4 | <0.001 | 2.3 | 2.2, 2.4 | <0.001 | 2.4 | 2.2, 2.5 | <0.001 |
b | -1.4 | -1.5, -1.3 | <0.001 | -1.3 | -1.4, -1.2 | <0.001 | -1.3 | -1.4, -1.2 | <0.001 |
1
CI = Confidence Interval
|
Sensor | NRMSE | Rsq |
---|---|---|
OLI | 1.21 | 0.79 |
MSI | 1.21 | 0.79 |
OLCI | 1.22 | 0.8 |
Characteristic | OLI | MSI | OLCI | ||||||
---|---|---|---|---|---|---|---|---|---|
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | -0.62 | -0.86, -0.38 | <0.001 | -0.48 | -0.70, -0.26 | <0.001 | -0.44 | -0.66, -0.22 | <0.001 |
I(`655`/`561`) | 3.0 | 2.8, 3.3 | <0.001 | ||||||
I(`664.6`/`559.8`) | 2.9 | 2.7, 3.1 | <0.001 | ||||||
I(`665`/`560`) | 2.9 | 2.7, 3.2 | <0.001 | ||||||
1
CI = Confidence Interval
|
Sensor | NRMSE | Rsq |
---|---|---|
OLI | 1.15 | 0.8 |
MSI | 1.17 | 0.8 |
OLCI | 1.18 | 0.8 |
6.2.2 Ag(295, 275)
Characteristic | OLI | MSI | OLCI | ||||||
---|---|---|---|---|---|---|---|---|---|
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
I(`655`/`561`) | 29 | 27, 32 | <0.001 | ||||||
I(`664.6`/`559.8`) | 28 | 26, 30 | <0.001 | ||||||
I(`665`/`560`) | 28 | 26, 30 | <0.001 | ||||||
1
CI = Confidence Interval
|
Sensor | NRMSE | Rsq |
---|---|---|
OLI | 15.7 | 0.8 |
MSI | 16 | 0.8 |
OLCI | 16.1 | 0.8 |
Characteristic | OLI | MSI | OLCI | ||||||
---|---|---|---|---|---|---|---|---|---|
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
I(`561`/`655`) | -14 | -16, -13 | <0.001 | ||||||
I(`559.8`/`664.6`) | -14 | -16, -12 | <0.001 | ||||||
I(`560`/`665`) | -13 | -15, -12 | <0.001 | ||||||
1
CI = Confidence Interval
|
Sensor | NRMSE | Rsq |
---|---|---|
OLI | 21.1 | 0.54 |
MSI | 21.2 | 0.56 |
OLCI | 21.3 | 0.56 |
6.3 SPM
We try to see if the ‘Region’ variable created earlier is of any importance in the determination of \(C_{spm}\) from \(R_{rs}\) to do that we can use a generalized linear model, glm function in R. The formula that we used take the form \(SPM \sim Rrs(\lambda) + Region + Rrs(\lambda) * Region\) the last part of the formula express the interaction effect.
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
log10(`664.6`) | 0.32 | 0.13, 0.51 | 0.001 |
Region | |||
EGSL | — | — | |
JB | 1.2 | 0.55, 1.8 | <0.001 |
log10(`664.6`) * Region | |||
log10(`664.6`) * JB | 0.61 | 0.38, 0.84 | <0.001 |
1
CI = Confidence Interval
|
As we see, \(R_{rs}\) and \(Region\) are significant variable to determine \(C_{spm}\), \(C_{spm}\) varies both across \(R_{rs}\) and \(Region\). The interaction factor is also significant, meaning that \(C_{spm}\) relation with \(R_{rs}\) also varies with \(Region\).
What could be misleading here, is that \(C_{spm}\) varying across \(Region\) is also a matter of \(C_{spm}\) range. This one is wider for JB than for EGSL, hence it may affect the significance of \(Region\).
Considering the results presented above, models have been fitted specifically for those region.
Grouped under the Estuary and Gulf of Saint-Lawrence (EGSL) and James Bay (JB) region, distribution of \(C_{spm}\) vs \(R_{rs}\) show two clear and different pattern.
6.3.1 SPM from Rrs
EGSL | |||||||||
---|---|---|---|---|---|---|---|---|---|
Characteristic | OLI | MSI | OLCI | ||||||
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | 1.8 | 1.2, 2.3 | <0.001 | 1.8 | 1.3, 2.4 | <0.001 | 1.8 | 1.3, 2.4 | <0.001 |
log10(`655`) | 0.29 | 0.11, 0.47 | 0.002 | ||||||
log10(`664.6`) | 0.32 | 0.13, 0.50 | <0.001 | ||||||
log10(`665`) | 0.31 | 0.13, 0.49 | 0.001 | ||||||
1
CI = Confidence Interval
|
JB | |||||||||
---|---|---|---|---|---|---|---|---|---|
Characteristic | OLI | MSI | OLCI | ||||||
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | 3.0 | 2.7, 3.3 | <0.001 | 3.0 | 2.7, 3.4 | <0.001 | 3.0 | 2.7, 3.4 | <0.001 |
log10(`655`) | 0.92 | 0.79, 1.1 | <0.001 | ||||||
log10(`664.6`) | 0.93 | 0.80, 1.1 | <0.001 | ||||||
log10(`665`) | 0.93 | 0.80, 1.1 | <0.001 | ||||||
1
CI = Confidence Interval
|
Region | Sensor | NRMSE | Rsq |
---|---|---|---|
EGSL | MSI | 1.43 | 0.55 |
JB | MSI | 1.16 | 0.65 |
6.3.2 SPM from Bbp_532
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
log10(Bbp_532) | 0.17 | 0.01, 0.33 | 0.035 |
Region | |||
EGSL | — | — | |
JB | 0.50 | 0.16, 0.83 | 0.004 |
log10(Bbp_532) * Region | |||
log10(Bbp_532) * JB | 0.70 | 0.51, 0.90 | <0.001 |
1
CI = Confidence Interval
|
Regionality is strongly confirmed in the relation SPM ~ Bbp(532)
EGSL | |||||||||
---|---|---|---|---|---|---|---|---|---|
Characteristic | OLI | MSI | OLCI | ||||||
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | 1.2 | 0.91, 1.6 | <0.001 | 1.2 | 0.91, 1.6 | <0.001 | 1.2 | 0.91, 1.6 | <0.001 |
log10(Bbp_532) | 0.17 | 0.00, 0.34 | 0.056 | 0.17 | 0.00, 0.34 | 0.056 | 0.17 | 0.00, 0.34 | 0.056 |
1
CI = Confidence Interval
|
JB | |||||||||
---|---|---|---|---|---|---|---|---|---|
Characteristic | OLI | MSI | OLCI | ||||||
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | 1.7 | 1.6, 1.9 | <0.001 | 1.7 | 1.6, 1.9 | <0.001 | 1.7 | 1.6, 1.9 | <0.001 |
log10(Bbp_532) | 0.87 | 0.76, 1.0 | <0.001 | 0.87 | 0.76, 1.0 | <0.001 | 0.87 | 0.76, 1.0 | <0.001 |
1
CI = Confidence Interval
|
Region | Sensor | NRMSE | Rsq |
---|---|---|---|
EGSL | MSI | 1.42 | 0.52 |
JB | MSI | 1.17 | 0.64 |
6.3.3 Bbp_532 from Rrs
Theoretically IOPs are not region specific and could be retrieved from AOPs with a single model. If this hold true, retrieving Bbp from Rrs and then linking Bbp to Cspm could improve Cspm retrieval from space.
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
log10(`664.6`) | 1.1 | 0.90, 1.2 | <0.001 |
Region | |||
EGSL | — | — | |
JB | -0.25 | -0.79, 0.29 | 0.36 |
log10(`664.6`) * Region | |||
log10(`664.6`) * JB | -0.19 | -0.38, 0.00 | 0.052 |
1
CI = Confidence Interval
|
In fact, regionality is not conclusive for Bbp in our dataset.
However, a closer look at the distribution seems to indicate an offset, notably in lower intensities.
EGSL | |||||||||
---|---|---|---|---|---|---|---|---|---|
Characteristic | OLI | MSI | OLCI | ||||||
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | 1.1 | 0.74, 1.5 | <0.001 | 1.2 | 0.82, 1.6 | <0.001 | 1.2 | 0.80, 1.6 | <0.001 |
log10(`655`) | 1.0 | 0.90, 1.2 | <0.001 | ||||||
log10(`664.6`) | 1.1 | 0.93, 1.2 | <0.001 | ||||||
log10(`665`) | 1.0 | 0.92, 1.2 | <0.001 | ||||||
1
CI = Confidence Interval
|
JB | |||||||||
---|---|---|---|---|---|---|---|---|---|
Characteristic | OLI | MSI | OLCI | ||||||
Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
(Intercept) | 0.91 | 0.60, 1.2 | <0.001 | 1.0 | 0.66, 1.3 | <0.001 | 1.0 | 0.68, 1.3 | <0.001 |
log10(`655`) | 0.86 | 0.74, 1.0 | <0.001 | ||||||
log10(`664.6`) | 0.87 | 0.75, 1.0 | <0.001 | ||||||
log10(`665`) | 0.87 | 0.76, 1.0 | <0.001 | ||||||
1
CI = Confidence Interval
|
Region | Sensor | NRMSE | Rsq |
---|---|---|---|
EGSL | MSI | 134 | 0.82 |
JB | MSI | 18.4 | 0.7 |