3 Prediction Using GAM
3.1 Bayesian Network
The accuracy of the prediction using Bayesian Network:
## ME RMSE MAE MPE MAPE
## Test set 9.386289 22.41716 13.33112 -26.19927 88.27519
The in-sample error is
a <- sqrt(mean((trainCount-pred_train)^2))/sqrt(mean((trainCount)^2))
#for validation data 2013-2015
c <- sqrt(mean((testCount-pred_test)^2))/sqrt(mean((testCount)^2))
The out-sample error for validation data 2013-2015 is
#for validation data 2013-2015
c <- sqrt(mean((testCount-pred_test)^2))/sqrt(mean((testCount)^2))
The mean absolute percentage error:
mape <- function(y, yhat){
mean(abs((y - yhat)/y), na.rm=T) * 100
}
mpe <- function(y, yhat){
mean((y - yhat)/y, na.rm=T) * 100
}
mape(testCount, pred_test)
## [1] 88.27519
mpe(testCount, pred_test)
## [1] -26.19927
3.2 Generalized Additive (Mixed) Models
3.2.1 Meterological Data
In this model, the association of meterologiocal variables i.e. DTR and averrage monthly rainfall is considered. I call this Dengue-Meteorological model.
The above summary in Appendix B.2 suggests that all the temperature and rain lag variables are important factors. Let’s visualize the additive model in Figure 3.1.
Model Name | RMSE | SRMSE | R-sq.(adj) | Deviance Explained |
---|---|---|---|---|
Meteorology Model | 8.462372 | 0.5200771 | 0.2831722 | 0.3188542 |
3.2.2 Dengue Surveillance Data
In this model the association of past denge incidences is considered.
3.2.2.1 Short-term Lag Model
The summary of the model is shown in Appendix B.3. Let’s visualize the additive model in Figure 3.3.
Model Name | RMSE | SRMSE | R-sq.(adj) | Deviance Explained |
---|---|---|---|---|
Short-term Lag Model | 7.895124 | 0.4852154 | 0.3988267 | 0.4104035 |
3.2.2.2 Long-term Lag Model
## This is dlnm 2.2.6. For details: help(dlnm) and vignette('dlnmOverview').
## Important changes: see file.show(system.file('Changesince220',package='dlnm'))
I show the simulated lag–response surfaces as relative risk in Figure 3.5.
The summary of the model is shown in Appendix B.5. Let’s visualize the additive model in Figure 3.6.
Model Name | RMSE | SRMSE | R-sq.(adj) | Deviance Explained |
---|---|---|---|---|
Optimal-term Lag Model | 7.317273 | 0.4497021 | 0.4901485 | 0.488694 |
3.2.3 Meteorology and Optimal-term Lag Model.
The summary of the model is shown in Appendix B.6. Let’s visualize the additive model in Figure 3.8.
Model Name | RMSE | SRMSE | R-sq.(adj) | Deviance Explained |
---|---|---|---|---|
Meteorology and Optimal-term Lag Model | 6.121665 | 0.3762228 | 0.6384466 | 0.6635623 |
3.2.4 Surrounding Dengue Data
I show the simulated lag–response surfaces for surrounding districts as relative risk in Figure 3.10.
3.2.5 Meteorology, Optimal-term and Short-term Surrounding Lag Model
The summary of the model is shown in Appendix @ref(appDMDS_Short). Let’s visualize the additive model in Figure @ref(fig:DMDS_Short).
Model Name | RMSE | SRMSE | R-sq.(adj) | Deviance Explained |
---|---|---|---|---|
Meteorology, Optimal(D) Short(D-S) Lag Model | 6.000072 | 0.36875 | 0.6521614 | 0.6726038 |
3.2.6 Meteorology, Optimal-term and Optimal-term Surrounding Lag Model
The summary of the model is shown in Appendix @ref(appDMDS_Optimal). Let’s visualize the additive model in Figure @ref(fig:DMDS_Optimal).
Model Name | RMSE | SRMSE | R-sq.(adj) | Deviance Explained |
---|---|---|---|---|
Meteorology, Optimal (D, D-S) Lag Model | 5.940368 | 0.3650807 | 0.6581915 | 0.6809764 |
3.4 Predictive Performance Statistics
On the training dataset.
Model Name | RMSE | SRMSE | R-sq.(adj) | Deviance Explained |
---|---|---|---|---|
Meteorology Model | 8.462372 | 0.5200771 | 0.2831722 | 0.3188542 |
Short-term Lag Model | 7.895124 | 0.4852154 | 0.3988267 | 0.4104035 |
Optimal-term Lag Model | 7.317273 | 0.4497021 | 0.4901485 | 0.4886940 |
Meteorology and Optimal-term Lag Model | 6.121665 | 0.3762228 | 0.6384466 | 0.6635623 |
Meteorology, Optimal(D) Short(D-S) Lag Model | 6.000072 | 0.3687500 | 0.6521614 | 0.6726038 |
Meteorology, Optimal (D, D-S) Lag Model | 5.940368 | 0.3650807 | 0.6581915 | 0.6809764 |
Social-economic data Included | 5.924147 | 0.3640839 | 0.6594216 | 0.7457561 |
3.5 Evaluation
Training Dataset | In-sample Error | Out-Sample (2013-2015) | Out-Sample (2014-2015) | Out-Sample (2013) | Out-Sample (2014) | Out-Sample (2015) |
---|---|---|---|---|---|---|
2008-2012 | 0.3650807 | 303.282 | 410.7401247 | 21.66415 | 1.4334466 | 437.4985502 |
2008-2013 | 0.3843044 | NA | 0.5627701 | NA | 0.3202835 | 0.5354210 |
2008-2014 | 0.3940100 | NA | NA | NA | NA | 0.4888606 |
3.3 Social-Economic Data
The summary of the model is shown in Appendix B.9. Let’s visualize the additive model in Figure 3.12.
Figure 3.12: Association between the meteorological variables, past dengue count over optimal lags within 1-30 months, surrounding district count over 0-30 months, garbage data and the dengue outbreak.. Solid lines represent relative risks (RR) of dengue cases and dottted lines depict the upper and lower limits of 95% confidence intervals.
Figure 3.13: Monthly Observed and predicted dengue cases (2008-2012).