Low level function to validate input `data`. Returns list of dataframes containing fails/warnings/replacements and passing observed values. The function detects which model (gis or physical) and area (GB & NI) and applies validation rules as required. Here is a summary of checks:
Input 'data' exists.
Input is dataframe.
Required columns are present.
Columns have correct class.
Conditional columns if present are correct class.
Where multiple classes are allowed, convert columns to standardised class for example integer to numeric.
Assess if model 'gis' (model 44) or 'physical' (model 1) based on input columns.
Assess if input 'gb' or 'ni' based grid reference.
Additional columns/variables calculated for example mean temperate.
Logs failures and warnings applied using `validation_rules` table.
Replace values if input values is zero (or close to zero) to avoid log10(0) and related errors.
Returns dataframe with passing values, notes on warnings, fails, replacements. And model and area parameters.
Arguments
- data
dataframe of observed environmental variables
- SITE
Site identifier
- Waterbody
Water body identifier
...
- row
Boolean - if set to `TRUE` returns the row number from the input file (data) for each check. This makes linking checks (fails/warns etc) to the associated row in the input data easier. This is more relevant if multiple samples from the same site and year are input as separate row to the `rict_validate`. In this case, SITE and YEAR are not enough to link validation checks to specific rows in the input data.
- stop_if_all_fail
Boolean - if set to `FALSE` the validation function will return empty dataframe for valid `data`. This is useful if you want to run validation checks without stopping process.
- area
Area is by detected by default from the NGR, but you can provide the area parameter either 'iom', 'gb, 'ni' for testing purposes.
- crs
optionally set crs to `29903` for Irish projection system.