Automated Validation of Aquatic Time Series Using a Probabilistic Parity Space Method

Peter Hudson, Touraj Farahmand, and Ed Quilty - Aquatic Informatics , 22 August, 2008

In recent decades, automated water quality and quantity monitoring has grown increasingly common in the study and assessment of aquatic systems. However, sensor and maintenance problems, in conjunction with radio telemetry transmission faults, can lead to erroneous measurements, and retrospective manual identification of errors in voluminous high-frequency data can be both challenging and highly inefficient. The problem of sensor validation has been extensively studied in the fields of chemical plant, aviation, and nuclear power engineering. We adapt the parity space method to the problem of aquatic data validation; by adjusting the phase between distant sensor to account for water travel time, by regression to remove offset and system response magnification or attenuation, by using historical data folded year over year where no suitable surrogate data for physical or analytic redundancy is available, and by using a gamma distribution of the parity vector magnitudes to assign point-by-point data validation flags.

An application of the method is presented: Three dissolved oxygen signals are examined and sensor drift and several measurement spikes are identified. In the case of the drifting dissolved oxygen sensor, point-by-point data flagging before the sensor began to obviously drift gives information regarding the potential quality of the data leading up to and during the drift. Additionally data spikes lying within the range of physical and plausible dissolved oxygen values were identified with data flags. The probabilistic parity space method offers the data manager the ability to validate a high-frequency signal more finely than in bulk sections between site visits. It further allows the data manager to efficiently generate point-by-point data validation flags for very large datasets. Our emphasis here lays upon validation of a target data series on the basis of consistency, or lack thereof, with physically or analytically redundant datasets. This specific application of the parity space technique requires the assumption that the redundant time series are of high quality, although the method is robust to deviations in a minority of redundant signals, and capture sufficient processes controlling the dynamics of the target series. If that assumption is not met, the parity space method becomes a more general method of congruency analysis. It might thus be extended to assess a target time series for a broader set of anomaly classes, such as unauthorized effluent discharges, an extension we leave for future work.