7 Data processing

7.1 Data validation

Table-based validation.
Functions to update & apply data validation tables, generate data with consistent nomenclature.
1. Labs & activity types
2. Activity media
3. Parameter names & fractions
4. Detection conditions
5. Detection limit types
6. Unit conversion factors
7. Apply screening tables and subset to accepted data


7.1.1 Update detection condition / limit name tables

updateDetCondLimTables(results=irdata$merged_results, detquantlim=irdata$detquantlim, translation_wb=translation_wb,
                        detConditionTable_startRow=2, detLimitTypeTable_startRow=2)
## [1] "No new ResultDetectionConditionText value(s) identified"
## [1] "No new DetectionQuantitationLimitTypeName value(s) identified."
## [1] "Translation workbook updated & saved."

7.1.2 Determine detection conditions and fill NDs

merged_results_filled=fillMaskedValues(results=irdata$merged_results, detquantlim=irdata$detquantlim, translation_wb=translation_wb,
                                       detLimitTypeTable_sheetname="detLimitTypeTable", detLimitTypeTable_startRow=2,
                                       unitConvTable_sheetname="unitConvTable", unitConvTable_startRow=1, unitConvTable_startCol=1,
                                       lql_fac=0.5, uql_fac=1)
## [1] "Checking for disparities in result units and limit units..."
## [1] "Unit conversion(s) needed between detection limit unit(s) and result unit(s). Checking for new unit conversions..."
## [1] "No new unit combinations detected. Proceeding to unit conversion..."
## Warning in fillMaskedValues(results = irdata$merged_results, detquantlim = irdata$detquantlim, : FYI: There are 33 records with both
## upper and lower quantitation limits and is.na(result values). These records have been assigned as 'ND's
## [1] "Detection condition counts:"
## 
##    DET     ND    NRV     OD 
## 717579 229705  14567    808

7.1.3 Update lab/activity & media tables

updateLabActMediaTables(merged_results_filled, translation_wb=translation_wb, labNameActivityTable_startRow = 2)
## [1] "labNameActivityTable updated. 0 new lab/activity combinations identified."
## [1] "activityMediaNameTable updated. 0 new media/submedia identified."
## [1] "Translation workbook updated & saved."

7.1.4 Apply screening tables

mrf_screened=applyScreenTable(merged_results_filled,wb=translation_wb,
                                sheetname="sites",startRow=2, flag_col_name="IR_Site_FLAG", com_col_name="IR_Site_COMMENT")

mrf_screened=applyScreenTable(mrf_screened,wb=translation_wb,
                                sheetname="detConditionTable",startRow=2, flag_col_name="IR_DetCond_FLAG", com_col_name="IR_DetCond_COMMENT")

mrf_screened=applyScreenTable(mrf_screened,wb=translation_wb,
                                sheetname="labNameActivityTable",startRow=2,flag_col_name="IR_LabAct_FLAG", com_col_name="IR_LabAct_COMMENT")

mrf_screened=applyScreenTable(mrf_screened,wb=translation_wb,
                                sheetname="activityMediaNameTable",startRow=1, flag_col_name="IR_Media_FLAG", com_col_name="IR_Media_COMMENT")

7.1.5 Subset data to desired flag types

mrf_sub=subset(mrf_screened,
    IR_DetCond_FLAG=="ACCEPT" &
    IR_LabAct_FLAG=="ACCEPT" &
    IR_Media_FLAG=="ACCEPT" &
    IR_Site_FLAG =="ACCEPT")

7.2 Data prep

Generating assessable dataset

  1. Assigning criteria (by parameter & use via criteria table)
  2. Convert to consistent units (target units defined by criteria)
  3. Check that data fractions & activities match assessment targets
  4. Aggregate to daily values
  5. Generate correction factors & calculate criteria
  6. Split data into assessment groups


7.2.1 Update parameter translation tables

updateParamTrans(data=mrf_sub, detquantlim=detquantlim,  translation_wb="00-lookup-tables\\ir_translation_workbook_live.xlsx")

7.2.2 Apply parameter translation table

mrf_sub_translated=applyScreenTable(mrf_sub,wb=translation_wb,
                                    sheetname="paramTransTable",startRow=4,flag_col_name="IR_Parameter_FLAG",com_col_name="IR_Parameter_COMMENT",
                                    na_dup_err=F)
## Warning in applyScreenTable(mrf_sub, wb = translation_wb, sheetname = "paramTransTable", : NA's generated in IR_Parameter_FLAG. This can occur for two reasons: (1) Combinations exist
## in merged_results that do not exist in paramTransTable, or (2) if na_dup_err=FALSE, NA's are likely present in domain table IR_FLAG. After making changes in domain table, re-run all
## applyScreenTables on a post-fillMaskedValues object to ensure changes are correctly merged.

7.2.3 Subset data to ACCEPT parameters

accept_data=subset(mrf_sub_translated, IR_Parameter_FLAG=="ACCEPT")
Table 7.1: Accepted parameter sample counts.
Parameter Count
Aluminum 7832
Ammonia 36
Ammonia-nitrogen 14063
Arsenic 8150
Cadmium 7824
Calcium 13204
Depth, data-logger (ported) 20838
Dissolved oxygen (DO) 52158
Escherichia coli 11613
Magnesium 13204
pH 43516
Temperature, water 48422
Total dissolved solids 16348

7.2.4 Criteria & unit assigments

7.2.4.1 Assign criteria

data_crit=assignCriteria(accept_data, crit_wb="00-lookup-tables\\IR_uses_standards.xlsx",
                                  crit_sheetname="criteria", ss_sheetname="ss_criteria", crit_startRow=1, ss_startRow=1, rm_nocrit=FALSE, print=FALSE)




Figure 7.1: Example data with criteria assigned. Note the site specific standard assignment example of Blue Creek (MLID UTAHDWQ_WQX-4960740).



7.2.4.2 Update unit conversion table

updateUnitConvTable(data_crit, translation_wb, sheetname = "unitConvTable")
## [1] "Translation workbook updated & saved."
## [1] "No new result-criterion unit combinations identified."



7.2.5 Final data prep step

prepped_data=dataPrep(data=data_crit, translation_wb=translation_wb, split_agg_tds=TRUE, crit_wb="00-lookup-tables\\IR_uses_standards.xlsx",
                        unit_sheetname = "unitConvTable", startRow_unit = 1, cf_formulas_sheetname="cf_formulas", startRow_formulas=1)
## Warning in dataPrep(data = data_crit, translation_wb = translation_wb, split_agg_tds = TRUE, : 579885 records being removed due to lack of criteria & units in standards table. These may
## have been purposely passed through in assign criteria.
## 
## Non-assessed activity type for parameter 
##                                    22078 
## 
##       No result value or detection limit Non-assessed activity type for parameter 
##                                      834                                    22078 
## 
##                                               No result value or detection limit                                         Non-assessed activity type for parameter 
##                                                                              834                                                                            22078 
## Non-assessed fraction or fraction not defined, & fraction specified by criterion 
##                                                                             4296 
## 
##                    Dissolved fraction result (-5%) > total fraction result (+5%)                                               No result value or detection limit 
##                                                                              126                                                                              834 
##                                         Non-assessed activity type for parameter Non-assessed fraction or fraction not defined, & fraction specified by criterion 
##                                                                            22078                                                                             4296 
## 
##                    Dissolved fraction result (-5%) > total fraction result (+5%)                                 Inappropriate units for conversion or assessment 
##                                                                              126                                                                            21213 
##                                               No result value or detection limit                                         Non-assessed activity type for parameter 
##                                                                              834                                                                            22078 
## Non-assessed fraction or fraction not defined, & fraction specified by criterion 
##                                                                             4296 
## 
##                    Dissolved fraction result (-5%) > total fraction result (+5%)                                 Inappropriate units for conversion or assessment 
##                                                                              126                                                                            21213 
##                                               No result value or detection limit                                         Non-assessed activity type for parameter 
##                                                                              834                                                                            22078 
## Non-assessed fraction or fraction not defined, & fraction specified by criterion                              Non-detect result with detection limit > criterion. 
##                                                                             4296                                                                             9906 
## [1] "Data prep record ACCEPT/REJECT counts:"
## 
## ACCEPT REJECT 
## 660622  73165
attach(prepped_data)
## The following objects are masked from prepped_data (pos = 4):
## 
##     accepted_data, agg_tds, conventionals, data_flags, ecoli, flag_reasons, lake_profiles, lakes_tds, lakes_trophic, toxics

Figure 7.2: Example final prepped data (toxics). Note that units have been converted match criteria, hardness has been calculated, and correction factors have been applied to calculate formula-based criteria.