6 Empirical models

6.1 Summary stats for train and test

As the model proposed here are purely empirical, it is of great importance to define the range for which they are applicable. The tables below present the summary statistics of each retrieved optically active constituents for the train and test datasets.

It also worth to note that as the modeled relationships depend on a variety of complex intricate cumulative effects (i.e. specifics IOPs), the time range of the measurement are also of importance as one cannot assume the OACs concentrations and distributions to remain constant.

Global summary
Characteristic N Overall, N = 3641 EGSL, N = 2031 JB, N = 1611
matchup 364 58 (16%) 58 (29%) 0 (0%)
SPM 347 6 (3, 10) 1 - 110 7 (5, 10) 1 - 38 4 (2, 8) 1 - 110
PIM 337 5 (3, 8) 0 - 101 6 (4, 8) 0 - 35 4 (2, 8) 1 - 101
POM 177 1.54 (1.16, 1.94) 0.55 - 4.07 1.54 (1.16, 1.94) 0.55 - 4.07 NA (NA, NA) NA - NA
Ag_440 343 1.58 (1.02, 2.36) 0.16 - 11.50 1.08 (0.52, 1.76) 0.16 - 5.02 1.89 (1.55, 3.37) 0.94 - 11.50
Ag_295 343 16 (9, 23) 2 - 100 10 (5, 17) 2 - 47 20 (17, 33) 12 - 100
Ag_275 343 23 (13, 31) 3 - 128 14 (7, 23) 3 - 66 28 (24, 44) 17 - 128
Bbp_532 276 0.03 (0.01, 0.06) 0.00 - 0.46 0.01 (0.01, 0.02) 0.00 - 0.10 0.05 (0.03, 0.10) 0.01 - 0.46
set 364
test 112 (31%) 83 (41%) 29 (18%)
train 252 (69%) 120 (59%) 132 (82%)

1 n (%); Median (IQR) Range

6.1.1 By Region

JB summary
Characteristic N test, N = 291 train, N = 1321
SPM 161 3 (2, 7) 1 - 38 5 (3, 9) 1 - 110
PIM 160 2 (2, 6) 1 - 32 4 (3, 8) 1 - 101
POM 0 NA (NA, NA) NA - NA NA (NA, NA) NA - NA
Ag_440 155 1.94 (1.42, 3.38) 1.24 - 7.63 1.89 (1.59, 3.35) 0.94 - 11.50
Ag_295 155 20 (15, 33) 14 - 70 21 (18, 33) 12 - 100
Ag_275 155 27 (21, 44) 20 - 91 28 (24, 44) 17 - 128
Bbp_532 144 0.04 (0.03, 0.10) 0.01 - 0.37 0.05 (0.03, 0.09) 0.01 - 0.46

1 Median (IQR) Range

EGSL summary
Characteristic N test, N = 831 train, N = 1201
SPM 186 6.4 (4.3, 9.1) 1.2 - 38.3 7.6 (5.7, 10.6) 1.6 - 34.6
PIM 177 5.3 (3.6, 7.9) 0.4 - 34.9 5.9 (4.3, 8.8) 0.9 - 32.3
POM 177 1.46 (1.15, 1.76) 0.75 - 4.07 1.65 (1.23, 2.09) 0.55 - 3.75
Ag_440 188 1.29 (0.64, 2.01) 0.20 - 5.02 1.03 (0.48, 1.63) 0.16 - 4.30
Ag_295 188 12 (6, 21) 3 - 45 10 (5, 15) 2 - 47
Ag_275 188 16 (8, 29) 4 - 61 13 (7, 20) 3 - 66
Bbp_532 132 0.017 (0.012, 0.024) 0.004 - 0.099 0.012 (0.007, 0.022) 0.003 - 0.061

1 Median (IQR) Range

6.2 Ag

Regionality test
Characteristic Beta 95% CI1 p-value
log10(`559.8`/`664.6`) -1.3 -1.4, -1.2 <0.001
Region
EGSL
JB -0.01 -0.06, 0.04 0.83
log10(`559.8`/`664.6`) * Region
log10(`559.8`/`664.6`) * JB 0.03 -0.16, 0.23 0.72

1 CI = Confidence Interval

The Region parameter is not significant here, in fact, no real difference of trend are observed between the two regions.

Figure 6.1: Ag regionality

Regionality test
Characteristic Beta 95% CI1 p-value
I(`664.6`/`559.8`) 2.3 1.8, 2.8 <0.001
Region
EGSL
JB -0.78 -1.3, -0.29 0.002
I(`664.6`/`559.8`) * Region
I(`664.6`/`559.8`) * JB 1.0 0.40, 1.6 0.001

1 CI = Confidence Interval

In linear space however, with the red/green ratio a small difference lead to a significant p value …

Figure 6.2: Ag regionality

6.2.1 Ag(440)

Several band ratio model are presented hereafter.

The first one is fitted with a blue / red ratio. The blue part of the visible spectrum is the most influenced by CDOM absorption and it is expected to better grasp the variability according to the theory. In fact the distribution of \(a_g(440)\) with \(B(blue)/B(red)\) ratio is very sharp. It show a steep slope, likely induced by the saturation of light absorbed by CDOM as succinctly observed in 4.3 (see Figure 4.7) . There is a small constant offset between the measured and fitted value, the addition of a constant offset could then improve the model.

Figure 6.3: non linear model for Ag, Blue over Red

Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
a 0.88 0.78, 1.0 <0.001 0.90 0.80, 1.0 <0.001 1.2 1.1, 1.3 <0.001
b -0.58 -0.63, -0.53 <0.001 -0.57 -0.61, -0.52 <0.001 -0.81 -0.88, -0.74 <0.001

1 CI = Confidence Interval

Figure 6.4: linear model for Ag, Red hover Blue

Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 1.1 0.91, 1.2 <0.001 1.1 1.0, 1.3 <0.001 0.39 0.21, 0.57 <0.001
I(`655`/`443`) 0.17 0.15, 0.20 <0.001
I(`664.6`/`442.7`) 0.17 0.15, 0.19 <0.001
I(`665`/`442.5`) 0.82 0.74, 0.89 <0.001

1 CI = Confidence Interval

The addition of a constant offset indeed slightly improve the model, as shown by performance metrics. However it is not clear to me, how this is justified …

Figure 6.5: non linear model for Ag, Blue over Red plus c offset

Sensor NRMSE Rsq
OLI 1.23 0.76
MSI 1.23 0.76
OLCI 1.25 0.76

The last one take advantage of of a green(edge) / red ratio, less affected by blue(ish) atmospheric correction errors and which give fairly good results. The slope of this distribution is less steep, confirming the saturation of light absorption by CDOM in the blue.

As this model present the least independents parameters and the best performance metrics it is chosen to be applied on Sensor images.

(#fig:mdl_Ag440_B3B4)non linear model for Ag, Green over Red

Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
a 2.2 2.1, 2.4 <0.001 2.3 2.2, 2.4 <0.001 2.4 2.2, 2.5 <0.001
b -1.4 -1.5, -1.3 <0.001 -1.3 -1.4, -1.2 <0.001 -1.3 -1.4, -1.2 <0.001

1 CI = Confidence Interval

Sensor NRMSE Rsq
OLI 1.21 0.79
MSI 1.21 0.79
OLCI 1.22 0.8

Figure 6.6: linear model for Ag, Red over Green

Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) -0.62 -0.86, -0.38 <0.001 -0.48 -0.70, -0.26 <0.001 -0.44 -0.66, -0.22 <0.001
I(`655`/`561`) 3.0 2.8, 3.3 <0.001
I(`664.6`/`559.8`) 2.9 2.7, 3.1 <0.001
I(`665`/`560`) 2.9 2.7, 3.2 <0.001

1 CI = Confidence Interval

Sensor NRMSE Rsq
OLI 1.15 0.8
MSI 1.17 0.8
OLCI 1.18 0.8

6.2.2 Ag(295, 275)

Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
I(`655`/`561`) 29 27, 32 <0.001
I(`664.6`/`559.8`) 28 26, 30 <0.001
I(`665`/`560`) 28 26, 30 <0.001

1 CI = Confidence Interval

Sensor NRMSE Rsq
OLI 15.7 0.8
MSI 16 0.8
OLCI 16.1 0.8
Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
I(`561`/`655`) -14 -16, -13 <0.001
I(`559.8`/`664.6`) -14 -16, -12 <0.001
I(`560`/`665`) -13 -15, -12 <0.001

1 CI = Confidence Interval

Sensor NRMSE Rsq
OLI 21.1 0.54
MSI 21.2 0.56
OLCI 21.3 0.56

6.3 SPM

We try to see if the ‘Region’ variable created earlier is of any importance in the determination of \(C_{spm}\) from \(R_{rs}\) to do that we can use a generalized linear model, glm function in R. The formula that we used take the form \(SPM \sim Rrs(\lambda) + Region + Rrs(\lambda) * Region\) the last part of the formula express the interaction effect.

Characteristic Beta 95% CI1 p-value
log10(`664.6`) 0.32 0.13, 0.51 0.001
Region
EGSL
JB 1.2 0.55, 1.8 <0.001
log10(`664.6`) * Region
log10(`664.6`) * JB 0.61 0.38, 0.84 <0.001

1 CI = Confidence Interval

As we see, \(R_{rs}\) and \(Region\) are significant variable to determine \(C_{spm}\), \(C_{spm}\) varies both across \(R_{rs}\) and \(Region\). The interaction factor is also significant, meaning that \(C_{spm}\) relation with \(R_{rs}\) also varies with \(Region\).

What could be misleading here, is that \(C_{spm}\) varying across \(Region\) is also a matter of \(C_{spm}\) range. This one is wider for JB than for EGSL, hence it may affect the significance of \(Region\).

Considering the results presented above, models have been fitted specifically for those region.

Grouped under the Estuary and Gulf of Saint-Lawrence (EGSL) and James Bay (JB) region, distribution of \(C_{spm}\) vs \(R_{rs}\) show two clear and different pattern.

6.3.1 SPM from Rrs

EGSL
Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 1.8 1.2, 2.3 <0.001 1.8 1.3, 2.4 <0.001 1.8 1.3, 2.4 <0.001
log10(`655`) 0.29 0.11, 0.47 0.002
log10(`664.6`) 0.32 0.13, 0.50 <0.001
log10(`665`) 0.31 0.13, 0.49 0.001

1 CI = Confidence Interval

JB
Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 3.0 2.7, 3.3 <0.001 3.0 2.7, 3.4 <0.001 3.0 2.7, 3.4 <0.001
log10(`655`) 0.92 0.79, 1.1 <0.001
log10(`664.6`) 0.93 0.80, 1.1 <0.001
log10(`665`) 0.93 0.80, 1.1 <0.001

1 CI = Confidence Interval

Region Sensor NRMSE Rsq
EGSL MSI 1.43 0.55
JB MSI 1.16 0.65

6.3.2 SPM from Bbp_532

Characteristic Beta 95% CI1 p-value
log10(Bbp_532) 0.17 0.01, 0.33 0.035
Region
EGSL
JB 0.50 0.16, 0.83 0.004
log10(Bbp_532) * Region
log10(Bbp_532) * JB 0.70 0.51, 0.90 <0.001

1 CI = Confidence Interval

Regionality is strongly confirmed in the relation SPM ~ Bbp(532)

EGSL
Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 1.2 0.91, 1.6 <0.001 1.2 0.91, 1.6 <0.001 1.2 0.91, 1.6 <0.001
log10(Bbp_532) 0.17 0.00, 0.34 0.056 0.17 0.00, 0.34 0.056 0.17 0.00, 0.34 0.056

1 CI = Confidence Interval

JB
Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 1.7 1.6, 1.9 <0.001 1.7 1.6, 1.9 <0.001 1.7 1.6, 1.9 <0.001
log10(Bbp_532) 0.87 0.76, 1.0 <0.001 0.87 0.76, 1.0 <0.001 0.87 0.76, 1.0 <0.001

1 CI = Confidence Interval

Region Sensor NRMSE Rsq
EGSL MSI 1.42 0.52
JB MSI 1.17 0.64

6.3.3 Bbp_532 from Rrs

Theoretically IOPs are not region specific and could be retrieved from AOPs with a single model. If this hold true, retrieving Bbp from Rrs and then linking Bbp to Cspm could improve Cspm retrieval from space.

Characteristic Beta 95% CI1 p-value
log10(`664.6`) 1.1 0.90, 1.2 <0.001
Region
EGSL
JB -0.25 -0.79, 0.29 0.36
log10(`664.6`) * Region
log10(`664.6`) * JB -0.19 -0.38, 0.00 0.052

1 CI = Confidence Interval

In fact, regionality is not conclusive for Bbp in our dataset.

Figure 6.7: Bbp(532) from MSI Rrs(664.6)

However, a closer look at the distribution seems to indicate an offset, notably in lower intensities.

EGSL
Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 1.1 0.74, 1.5 <0.001 1.2 0.82, 1.6 <0.001 1.2 0.80, 1.6 <0.001
log10(`655`) 1.0 0.90, 1.2 <0.001
log10(`664.6`) 1.1 0.93, 1.2 <0.001
log10(`665`) 1.0 0.92, 1.2 <0.001

1 CI = Confidence Interval

JB
Characteristic OLI MSI OLCI
Beta 95% CI1 p-value Beta 95% CI1 p-value Beta 95% CI1 p-value
(Intercept) 0.91 0.60, 1.2 <0.001 1.0 0.66, 1.3 <0.001 1.0 0.68, 1.3 <0.001
log10(`655`) 0.86 0.74, 1.0 <0.001
log10(`664.6`) 0.87 0.75, 1.0 <0.001
log10(`665`) 0.87 0.76, 1.0 <0.001

1 CI = Confidence Interval

Region Sensor NRMSE Rsq
EGSL MSI 134 0.82
JB MSI 18.4 0.7