6 Empirical models | Remote sensing algorithms to retrieve SPM and CDOM in Québec coastal waters

6.1 Summary stats for train and test

As the model proposed here are purely empirical, it is of great importance to define the range for which they are applicable. The tables below present the summary statistics of each retrieved optically active constituents for the train and test datasets.

It also worth to note that as the modeled relationships depend on a variety of complex intricate cumulative effects (i.e. specifics IOPs), the time range of the measurement are also of importance as one cannot assume the OACs concentrations and distributions to remain constant.

Global summary

Characteristic	N	Overall, N = 364¹	EGSL, N = 203¹	JB, N = 161¹
matchup	364	58 (16%)	58 (29%)	0 (0%)
SPM	347	6 (3, 10) 1 - 110	7 (5, 10) 1 - 38	4 (2, 8) 1 - 110
PIM	337	5 (3, 8) 0 - 101	6 (4, 8) 0 - 35	4 (2, 8) 1 - 101
POM	177	1.54 (1.16, 1.94) 0.55 - 4.07	1.54 (1.16, 1.94) 0.55 - 4.07	NA (NA, NA) NA - NA
Ag_440	343	1.58 (1.02, 2.36) 0.16 - 11.50	1.08 (0.52, 1.76) 0.16 - 5.02	1.89 (1.55, 3.37) 0.94 - 11.50
Ag_295	343	16 (9, 23) 2 - 100	10 (5, 17) 2 - 47	20 (17, 33) 12 - 100
Ag_275	343	23 (13, 31) 3 - 128	14 (7, 23) 3 - 66	28 (24, 44) 17 - 128
Bbp_532	276	0.03 (0.01, 0.06) 0.00 - 0.46	0.01 (0.01, 0.02) 0.00 - 0.10	0.05 (0.03, 0.10) 0.01 - 0.46
set	364
test		112 (31%)	83 (41%)	29 (18%)
train		252 (69%)	120 (59%)	132 (82%)
¹ n (%); Median (IQR) Range

6.1.1 By Region

JB summary

Characteristic	N	test, N = 29¹	train, N = 132¹
SPM	161	3 (2, 7) 1 - 38	5 (3, 9) 1 - 110
PIM	160	2 (2, 6) 1 - 32	4 (3, 8) 1 - 101
POM	0	NA (NA, NA) NA - NA	NA (NA, NA) NA - NA
Ag_440	155	1.94 (1.42, 3.38) 1.24 - 7.63	1.89 (1.59, 3.35) 0.94 - 11.50
Ag_295	155	20 (15, 33) 14 - 70	21 (18, 33) 12 - 100
Ag_275	155	27 (21, 44) 20 - 91	28 (24, 44) 17 - 128
Bbp_532	144	0.04 (0.03, 0.10) 0.01 - 0.37	0.05 (0.03, 0.09) 0.01 - 0.46
¹ Median (IQR) Range

EGSL summary

Characteristic	N	test, N = 83¹	train, N = 120¹
SPM	186	6.4 (4.3, 9.1) 1.2 - 38.3	7.6 (5.7, 10.6) 1.6 - 34.6
PIM	177	5.3 (3.6, 7.9) 0.4 - 34.9	5.9 (4.3, 8.8) 0.9 - 32.3
POM	177	1.46 (1.15, 1.76) 0.75 - 4.07	1.65 (1.23, 2.09) 0.55 - 3.75
Ag_440	188	1.29 (0.64, 2.01) 0.20 - 5.02	1.03 (0.48, 1.63) 0.16 - 4.30
Ag_295	188	12 (6, 21) 3 - 45	10 (5, 15) 2 - 47
Ag_275	188	16 (8, 29) 4 - 61	13 (7, 20) 3 - 66
Bbp_532	132	0.017 (0.012, 0.024) 0.004 - 0.099	0.012 (0.007, 0.022) 0.003 - 0.061
¹ Median (IQR) Range

6.2 Ag

Regionality test

Characteristic	Beta	95% CI¹	p-value
log10(`559.8`/`664.6`)	-1.3	-1.4, -1.2	<0.001
Region
EGSL	—	—
JB	-0.01	-0.06, 0.04	0.83
log10(`559.8`/`664.6`) * Region
log10(`559.8`/`664.6`) * JB	0.03	-0.16, 0.23	0.72
¹ CI = Confidence Interval

The Region parameter is not significant here, in fact, no real difference of trend are observed between the two regions.

Figure 6.1: Ag regionality

Regionality test

Characteristic	Beta	95% CI¹	p-value
I(`664.6`/`559.8`)	2.3	1.8, 2.8	<0.001
Region
EGSL	—	—
JB	-0.78	-1.3, -0.29	0.002
I(`664.6`/`559.8`) * Region
I(`664.6`/`559.8`) * JB	1.0	0.40, 1.6	0.001
¹ CI = Confidence Interval

In linear space however, with the red/green ratio a small difference lead to a significant p value …

Figure 6.2: Ag regionality

6.2.1 Ag(440)

Several band ratio model are presented hereafter.

The first one is fitted with a blue / red ratio. The blue part of the visible spectrum is the most influenced by CDOM absorption and it is expected to better grasp the variability according to the theory. In fact the distribution of \(a_g(440)\) with \(B(blue)/B(red)\) ratio is very sharp. It show a steep slope, likely induced by the saturation of light absorbed by CDOM as succinctly observed in 4.3 (see Figure 4.7) . There is a small constant offset between the measured and fitted value, the addition of a constant offset could then improve the model.

Figure 6.3: non linear model for Ag, Blue over Red

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
a	0.88	0.78, 1.0	<0.001	0.90	0.80, 1.0	<0.001	1.2	1.1, 1.3	<0.001
b	-0.58	-0.63, -0.53	<0.001	-0.57	-0.61, -0.52	<0.001	-0.81	-0.88, -0.74	<0.001
¹ CI = Confidence Interval

Figure 6.4: linear model for Ag, Red hover Blue

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	1.1	0.91, 1.2	<0.001	1.1	1.0, 1.3	<0.001	0.39	0.21, 0.57	<0.001
I(`655`/`443`)	0.17	0.15, 0.20	<0.001
I(`664.6`/`442.7`)				0.17	0.15, 0.19	<0.001
I(`665`/`442.5`)							0.82	0.74, 0.89	<0.001
¹ CI = Confidence Interval

The addition of a constant offset indeed slightly improve the model, as shown by performance metrics. However it is not clear to me, how this is justified …

Figure 6.5: non linear model for Ag, Blue over Red plus c offset

Sensor	NRMSE	Rsq
OLI	1.23	0.76
MSI	1.23	0.76
OLCI	1.25	0.76

The last one take advantage of of a green(edge) / red ratio, less affected by blue(ish) atmospheric correction errors and which give fairly good results. The slope of this distribution is less steep, confirming the saturation of light absorption by CDOM in the blue.

As this model present the least independents parameters and the best performance metrics it is chosen to be applied on Sensor images.

(#fig:mdl_Ag440_B3B4)non linear model for Ag, Green over Red

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
a	2.2	2.1, 2.4	<0.001	2.3	2.2, 2.4	<0.001	2.4	2.2, 2.5	<0.001
b	-1.4	-1.5, -1.3	<0.001	-1.3	-1.4, -1.2	<0.001	-1.3	-1.4, -1.2	<0.001
¹ CI = Confidence Interval

Sensor	NRMSE	Rsq
OLI	1.21	0.79
MSI	1.21	0.79
OLCI	1.22	0.8

Figure 6.6: linear model for Ag, Red over Green

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	-0.62	-0.86, -0.38	<0.001	-0.48	-0.70, -0.26	<0.001	-0.44	-0.66, -0.22	<0.001
I(`655`/`561`)	3.0	2.8, 3.3	<0.001
I(`664.6`/`559.8`)				2.9	2.7, 3.1	<0.001
I(`665`/`560`)							2.9	2.7, 3.2	<0.001
¹ CI = Confidence Interval

Sensor	NRMSE	Rsq
OLI	1.15	0.8
MSI	1.17	0.8
OLCI	1.18	0.8

6.2.2 Ag(295, 275)

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
I(`655`/`561`)	29	27, 32	<0.001
I(`664.6`/`559.8`)				28	26, 30	<0.001
I(`665`/`560`)							28	26, 30	<0.001
¹ CI = Confidence Interval

Sensor	NRMSE	Rsq
OLI	15.7	0.8
MSI	16	0.8
OLCI	16.1	0.8

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
I(`561`/`655`)	-14	-16, -13	<0.001
I(`559.8`/`664.6`)				-14	-16, -12	<0.001
I(`560`/`665`)							-13	-15, -12	<0.001
¹ CI = Confidence Interval

Sensor	NRMSE	Rsq
OLI	21.1	0.54
MSI	21.2	0.56
OLCI	21.3	0.56

6.3 SPM

We try to see if the ‘Region’ variable created earlier is of any importance in the determination of \(C_{spm}\) from \(R_{rs}\) to do that we can use a generalized linear model, glm function in R. The formula that we used take the form \(SPM \sim Rrs(\lambda) + Region + Rrs(\lambda) * Region\) the last part of the formula express the interaction effect.

Characteristic	Beta	95% CI¹	p-value
log10(`664.6`)	0.32	0.13, 0.51	0.001
Region
EGSL	—	—
JB	1.2	0.55, 1.8	<0.001
log10(`664.6`) * Region
log10(`664.6`) * JB	0.61	0.38, 0.84	<0.001
¹ CI = Confidence Interval

As we see, \(R_{rs}\) and \(Region\) are significant variable to determine \(C_{spm}\), \(C_{spm}\) varies both across \(R_{rs}\) and \(Region\). The interaction factor is also significant, meaning that \(C_{spm}\) relation with \(R_{rs}\) also varies with \(Region\).

What could be misleading here, is that \(C_{spm}\) varying across \(Region\) is also a matter of \(C_{spm}\) range. This one is wider for JB than for EGSL, hence it may affect the significance of \(Region\).

Considering the results presented above, models have been fitted specifically for those region.

Grouped under the Estuary and Gulf of Saint-Lawrence (EGSL) and James Bay (JB) region, distribution of \(C_{spm}\) vs \(R_{rs}\) show two clear and different pattern.

6.3.1 SPM from Rrs

EGSL

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	1.8	1.2, 2.3	<0.001	1.8	1.3, 2.4	<0.001	1.8	1.3, 2.4	<0.001
log10(`655`)	0.29	0.11, 0.47	0.002
log10(`664.6`)				0.32	0.13, 0.50	<0.001
log10(`665`)							0.31	0.13, 0.49	0.001
¹ CI = Confidence Interval

JB

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	3.0	2.7, 3.3	<0.001	3.0	2.7, 3.4	<0.001	3.0	2.7, 3.4	<0.001
log10(`655`)	0.92	0.79, 1.1	<0.001
log10(`664.6`)				0.93	0.80, 1.1	<0.001
log10(`665`)							0.93	0.80, 1.1	<0.001
¹ CI = Confidence Interval

Region	Sensor	NRMSE	Rsq
EGSL	MSI	1.43	0.55
JB	MSI	1.16	0.65

6.3.2 SPM from Bbp_532

Characteristic	Beta	95% CI¹	p-value
log10(Bbp_532)	0.17	0.01, 0.33	0.035
Region
EGSL	—	—
JB	0.50	0.16, 0.83	0.004
log10(Bbp_532) * Region
log10(Bbp_532) * JB	0.70	0.51, 0.90	<0.001
¹ CI = Confidence Interval

Regionality is strongly confirmed in the relation SPM ~ Bbp(532)

EGSL

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	1.2	0.91, 1.6	<0.001	1.2	0.91, 1.6	<0.001	1.2	0.91, 1.6	<0.001
log10(Bbp_532)	0.17	0.00, 0.34	0.056	0.17	0.00, 0.34	0.056	0.17	0.00, 0.34	0.056
¹ CI = Confidence Interval

JB

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	1.7	1.6, 1.9	<0.001	1.7	1.6, 1.9	<0.001	1.7	1.6, 1.9	<0.001
log10(Bbp_532)	0.87	0.76, 1.0	<0.001	0.87	0.76, 1.0	<0.001	0.87	0.76, 1.0	<0.001
¹ CI = Confidence Interval

Region	Sensor	NRMSE	Rsq
EGSL	MSI	1.42	0.52
JB	MSI	1.17	0.64

6.3.3 Bbp_532 from Rrs

Theoretically IOPs are not region specific and could be retrieved from AOPs with a single model. If this hold true, retrieving Bbp from Rrs and then linking Bbp to Cspm could improve Cspm retrieval from space.

Characteristic	Beta	95% CI¹	p-value
log10(`664.6`)	1.1	0.90, 1.2	<0.001
Region
EGSL	—	—
JB	-0.25	-0.79, 0.29	0.36
log10(`664.6`) * Region
log10(`664.6`) * JB	-0.19	-0.38, 0.00	0.052
¹ CI = Confidence Interval

In fact, regionality is not conclusive for Bbp in our dataset.

Figure 6.7: Bbp(532) from MSI Rrs(664.6)

However, a closer look at the distribution seems to indicate an offset, notably in lower intensities.

EGSL

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	1.1	0.74, 1.5	<0.001	1.2	0.82, 1.6	<0.001	1.2	0.80, 1.6	<0.001
log10(`655`)	1.0	0.90, 1.2	<0.001
log10(`664.6`)				1.1	0.93, 1.2	<0.001
log10(`665`)							1.0	0.92, 1.2	<0.001
¹ CI = Confidence Interval

JB

Characteristic	OLI			MSI			OLCI
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
(Intercept)	0.91	0.60, 1.2	<0.001	1.0	0.66, 1.3	<0.001	1.0	0.68, 1.3	<0.001
log10(`655`)	0.86	0.74, 1.0	<0.001
log10(`664.6`)				0.87	0.75, 1.0	<0.001
log10(`665`)							0.87	0.76, 1.0	<0.001
¹ CI = Confidence Interval

Region	Sensor	NRMSE	Rsq
EGSL	MSI	134	0.82
JB	MSI	18.4	0.7