## Abstract

We develop an approach for water quality time series monitoring and contamination event detection. The approach combines affine projection algorithms and an autoregressive (AR) model to predict water quality time series. Then, we apply online change-point detection methods to the estimated residuals to determine the presence, or not, of contamination events. Particularly, we compare the performance of four change-point detection methods, namely, sequential probability ratio test (SPRT), cumulative sum (CUSUM), binomial event discriminator (BED), and online Bayesian change-point detection (OBCPD), by using residuals obtained from four water quality time series, chlorine, conductivity, total organic carbon, and turbidity. Our fundamental criterion for the performance evaluation of the four change-point detection methods is given by the receiver operating characteristic (ROC) curve which is characterized by the true positive rate as a function of the false positive rate. We highlight with detailed experiments that OBCPD provides the best performance for large contamination events, and we also provide insight on the choice of change-point detection algorithms to consider for designing efficient contamination detection schemes.

- change-point detection
- prediction
- ROC
- time series
- water quality

## INTRODUCTION

The expansion of the world's population leads to the scarcity of drinking water and drives increasing exploitation of untapped water resources. Among them, the desalination of seawater or saline ground water and the purification of water originating from rivers or lakes are of growing interest. Chemical, biological, and physical parameters defining water quality criteria and quality thresholds imposed by government agencies and health institutions must be satisfied. This requires the implementation of monitoring procedures in order to assess the conformity of potable water against specified objectives. Facilitated by the advent of information communication and technology, combined with the development of state-of-the-art signal processing algorithms, new approaches for efficient online water quality monitoring are investigated. The benefit of such approaches is to allow for online monitoring of the measurements of large scale water systems and early detection of intentional or unintentional contamination events.

### Complexity

The design of an efficient online water quality monitoring approach is complicated by several factors. First, changes within the system including variation of source water or the treatment process, the fluctuation of water demand, and changes in the water quality sensors can create anomalous water quality signatures, different from contamination events. Distinguishing normal behavior from abnormal behavior is difficult in these situations. Second, due to the complex characteristics of water quality, a physical model describing water quality behavior is difficult to derive and implement in real time. This complexity has motivated the extensive use of data-driven models for predicting water quality behavior. Although data-driven approaches have proven to be an efficient alternative, several challenges persist including: accurate detection of contamination events with low false positive rates (FPRs) and low detection latency; definition of an adequate trade-off between sensitivity to contamination events and remaining insensitive to non-event changes; characterization of faults due to water quality sensor failures, numerical algorithmic instabilities and water process operation. In an attempt to alleviate some of these challenges, research in both forecasting and monitoring has been conducted.

### Water quality forecasting and monitoring

Time series forecasting algorithms have seen widespread application in hydrology. As examples, Yu *et al*. (2004) combine a support vector machine with chaos theory to analyze chaotic time series, and Phoon *et al*. (2002) explore an inverse method to forecast nonlinear time series. In a similar vein, artificial neural network methods have been applied in Palani *et al*. (2008) to forecast water quality data. To detect contamination events, classification methods have also been widely examined. As an example, Yang *et al*. (2009) develop a real-time method for contaminant detection and classification, while Branisavljevic *et al*. (2011) examine a context classification method for real-time anomaly detection. A real-time, threshold-based approach for detecting intentional contamination events is proposed by Byer (2005). Studies concerning attributes of water quality sensors in the framework of contamination events have also been investigated; for more details see McKenna *et al*. (2006), Hatzikos *et al*. (2007), Yang *et al*. (2009) and Koch & McKenna (2010). Work on the subject is presented in various reports and reviews, for example Grayman *et al*. (1988), Hasan *et al*. (2004, 2005), Chau (2006), Murray *et al*. (2010) and Perelman *et al*. (2012). A general framework for event detection is to combine a forecasting method with an algorithm that can analyze the residuals of the forecast to determine the onset of an event. Change-point detection methods provide a means of event determination and are being applied for water quality monitoring.

### Change-point detection

Change-point detection methods have been the subject of intensive research investigations. Due to their sequential nature and their adequacy to address online problems, sequential probability ratio test (SPRT; Wald 1945), cumulative sum (CUSUM; Page 1954), and generalized likelihood ratio (GLR; Willsky & Jones 1976) are among the most popular frequentist change-point detection methods. The commonality of these widely used algorithms is that the logarithm of the likelihood ratio between two consecutive intervals, for the same time series, is monitored and a change-point is declared whenever these intervals possess different statistical properties. In spite of their widespread success for practical applications, their major limitation comes from their dependence on the knowledge of probability distribution functions (pdf) of the data. Furthermore, other approaches, such as spectral based methods (Adak 1998), maximum likelihood estimation (Guralnik & Srivastava 1999), and subspace identification (Katayama 2005) have also been explored. The limitation of all these change-point detection methods is that they rely on prespecified thresholds, which are difficult to establish a priori. However, most of the Bayesian change-point detection (BCPD) approaches have been used in an offline setting for retrospective studies, which means that the entire data set is required before computation of the probability of change-point. To overcome these limitations, and those of the frequentist methods, online BCPD (OBCPD) approaches have been introduced by Adams & MacKay (2007).

The purpose of this paper is to provide an efficient methodology for water quality monitoring and contamination event detection. We review a recent development in time series forecasting and use it to forecast different water quality time series. The performance of four change-point detection algorithms is compared using residuals from the forecast. Note that the change-point detection algorithms examined here can be applied to residuals derived from any predictive modeling algorithm. We place emphasis on the smallest detectable contaminant, and the rate of false and true positives, which are characterized by the receiver operating characteristic (ROC) curve. Particularly, we show that our methodology is able to distinguish normal operation changes from contamination events. For each change-point detection method, and a particular contamination event, we provide the ROC curve. The remainder of the paper is organized as follows. The following section, ‘Water quality time series prediction’, presents the time series prediction method used to learn the parameters of the autoregressive (AR) model describing the water quality data. Next is ‘Water quality time series monitoring’, which introduces the frequentist approaches used in this paper, namely SPRT and CUSUM as well as the binomial event discriminator (BED) and the OBCPD. This is followed by ‘Application of change-point detection methods to water quality data’ in which the experiments with water quality time series are examined. The final section provides conclusions and future work.

## WATER QUALITY TIME SERIES PREDICTION

We model our water quality time series with an AR model, then we apply affine projection algorithms (APA) to learn the parameters of the model. An AR model with *l* AR terms (Box *et al*. 2013) can be written aswhere are fixed parameters and is the noise with mean zero and variance . Here, we use APA to estimate, for each water quality time series, the parameters and the associated residuals. APA consider that the output is linearly related to the input vector by the model (1)1where *M* is the length of the filter, is the parameter vector to learn, and is an independent and identically distributed additive noise. Upon arrival of each new input and output , the objective is to determine , the estimate of the parameter vector that minimizes predictive error. To this end, we use the block matrix , the desired output , and the estimate of the noise vector . The APA allows recursive estimation of the parameter vector and the associated noise as follows:23The parameter is applied as a regularization term for the solution and is the iteration step size. For more details on APA and its extension to adaptive regularization, see Ba & McKenna (2013).

## WATER QUALITY TIME SERIES MONITORING

We present in this section the frequentist algorithms, SPRT and CUSUM, the BED and the OBCPD.

### Frequentist approaches for change-point detection

Frequentist approaches are based on hypothesis testing, and they are designed to signal an alert when a change-point occurs. We present two frequentist approaches, derived from Wald's hypothesis testing, used here for water quality time series monitoring, namely SPRT and CUSUM.

#### Sequential probability ratio test

SPRT is a sequential testing hypothesis that tests and allows determination between a hypothesis versus an alternative hypothesis (Wald 1945). We consider that is the hypothesis corresponding to the absence of contamination, and the hypothesis in the presence of contamination. If we assume that the hypothesis is described by the statistics , and the hypothesis , then the test statistic is the logarithm of the likelihood ratio between the two hypotheses, and is expressed as4where and is the logarithm likelihood ratio for the residual at time *t*, with the probability density function for the hypothesis , and the probability density function for the hypothesis . Here, we consider that *f* is a Gaussian distribution. The SPRT stops in favor of if the lower bound *A* is crossed with an error rate *a*, and if the upper bound *B* is crossed with an error rate *b*. The lower and upper bounds *A* and *B* are defined by5The time *l* at which a change-point occurs is characterized by the change in the sign. In summary, if we consider that our decision rule is and our stopping time is , then the SPRT is defined as6If the decision rule there is no anomaly, conversely if there is an anomaly7

#### Cumulative sum test

CUSUM considers as input the sequence of residuals , which has a probability density function . In this study *f* is considered as a Gaussian distribution, where are independent and identically distributed residuals with a density , and are independent and identically distributed variables having a density , where *l* is the time at which the change-point occurs and is provided by the following relation:8and is a pre-defined threshold, and is given by:9

### Binomial event discriminator

The BED uses a binomial failure model to aggregate residuals over consecutive time steps into a declaration of an event or not (McKenna *et al*. 2007). For each time step, the absolute value of the residual is compared with a threshold, , and classified as an outlier or not. The probability of a water quality anomaly (event) occurring at time *t* is calculated as the cumulative probability of the binomial distribution10where each outlier, *r*, is considered a failure within a window containing *W* time steps, and representing the number of trials in the binomial model, *i* is the time instant and is the number of trials. The resulting is compared with a probability threshold, *h*, to determine existence of an event. The probability of an individual failure occurring in any time step is fixed at . In application here, the residuals within a moving window of length, , representing the number of time steps over which the data are standardized, are standardized prior to use. The standardization is achieved by dividing the residual by its standard deviation within .

### Bayesian approach to change-point detection

In Adams & MacKay (2007), the authors develop an online version of OBCPD. Here, OBCPD allows us to detect the time at which the statistical properties of water quality time series change because of the presence of contamination events. With OBCPD, the objective is, given a sequence of residuals from the time instant 1 up to *t*, , to predict the next water quality residual11with12where represents the ‘run length’, a run length is the number of data in a ‘run’, and a ‘run’ is the current data back to the most recent change-point, is a -dimensional residual at time *t*. The notation represents the residuals associated to the run length . To compute the probability of change-point let us examine in more detail Equations (11) and (12). In Equation (12) we consider that the predictive probability is a Student's -distribution with a probability function given by13where is the center, the standard deviation and the degree of freedom. If we consider that the residuals follow a normal gamma distribution, therefore , , where the updating rule for the center , the precision , the shape , and the scale is given by14151617We showed here how we evaluate at each time step the next residual knowing the statistics of the previous one. Next, we provide an insight into the determination of . We consider that at each time step either there is a change point and the run length drops to zero, , or there is no change and the run length continues to grow, . To determine the probability of the run length at time *t* conditional on the run length at time , , we use a ‘hazard function’ *H*. A hazard function *H* is defined as an event rate at time *t* conditional on survival until time *t*18where is the hazard function at time , in our case the hazard function is constant at , where is the time scale.

In the absence of a change-point, we use all the quantities described above to arrive at19In the presence of change-points Equation (19) becomes20

Using Equations (19) and (20) it is possible to determine the probability distribution of the run length conditional on the sequence of the residual from the initial sample up to time *t*, Equation (12). Finally, knowing all the aforementioned equations it is possible to evaluate Equation (21) with the maximum a posteriori and obtain the evolution of the run length21

## APPLICATION OF CHANGE-POINT DETECTION METHODS TO WATER QUALITY DATA

We analyze the performance of the four change-point detection methods introduced previously. First, we incorporate contamination events into the measured background data. Then, we apply the APA to learn the coefficients of the AR model describing the water quality time series. The obtained residuals are subsequently used with the change-point detection methods to monitor water quality.

### Data set

We experiment using four water quality time series, which have been selected due to these parameters having extensive use in online water quality monitoring. These data are taken from the ‘Station C’ data set supplied as part of the Event Detection System Challenge organized by the US Environmental Protection Agency (EPA 2013). All signals are collected at the same location in an operating water distribution network with a 2-minute sampling rate. The measured signals are as follows.

Chlorine: Chlorine is used in the majority of drinking water networks as a disinfectant to eliminate microbial and other contaminants. Typical values are maintained between 0.5 and 1.0 mg/L.

Conductivity: Measures the ability of water to transmit an electrical current and is a measure of the amount of dissolved solids in the water. Typical values in drinking water networks range from 50 to 500 μS/cm.

Total organic carbon (TOC): Gives the amount of organic carbon dissolved in the water. Organic carbon in water is generally from decaying naturally occurring organic matter in the source waters. TOC levels are generally below 10 mg/L.

Turbidity: Measures the water clarity and is often subject to sharp increases due to changes in flow that suspend sediment within the water. Typical values are near 0.0 with short-lived increases to 5.0 NTU or more being common.

The background values of these water quality parameters can be altered by the presence of contaminants added to the water (see Hall *et al*. 2007). Therefore, the objective is to monitor these variables in real-time and rapidly and accurately detect any abnormal situations.

### Contamination events

We simulate contamination events by adding a series of anomalous water quality values to the observed background data. These are added with a function that allows the sharpness of the anomaly start and end along with the length and amplitude of the anomaly to be controlled by the user (see Murray *et al*. 2010)
where and are the water quality signal in the event and background forms, respectively. The parameters and determine respectively the location and the deviation from the background of the contamination. The parameter is set to −1.0 or 1.0 to define the direction of the signal deviation away from the background. Values of are taken from the Gaussian cumulative distribution function (cdf) with a width specified in the number of time steps to allow for a sharper or more gradual change from and back to the background levels. The maximum deviation from the background is given by , where is the standard deviation of the background signal. In the experiments reported here, events are added to the background at every 110 time steps (220 minutes) with a maximum strength of 1, a total length of 16 time steps (32 minutes) and a transition from background to full strength over six time steps (12 minutes).

### Tuning parameters

We present the tuning parameters that have been used, and which allow the figures that will be presented in the next subsection to be obtained. Recall that both SPRT and CUSUM consider the knowledge of the pdf of the data. Here, we use Gaussian pdf, where in the absence of contamination the mean and variance of our data are respectively given by and , which are determined by computing respectively the mean and standard deviation of the signal of interest, and in the absence of contamination. They change to and in the presence of contamination, and in this study we consider that and for all the measurements except the TOC, where and . A and B were computed by using Equation (5), and determine the thresholds for SPRT, the threshold for the CUSUM test is given by . We summarize the values of the tuning parameters for the SPRT and CUSUM in Table 1. As mentioned previously, the BED considers as tuning parameters the threshold , the window *W*, the probability of failure , and the threshold *h*; the probability of failure has been fixed at 0.5. This choice is driven by the trade-off that one wishes to establish between the number of false alarms and the number of missed detections. For OBCPD (Table 2), the initial tuning parameters are the center , the precision , the shape , the scale , and the hazard rate *H*; for these tuning parameters we fix , , for all the water quality parameters and we attribute them the same hazard rate , except for the conductivity , which is due to the fact that the conductivity measurements present more noise than the rest of the measurements. For the scale , the initial parameter has been determined empirically. In all cases, these parameters are set without any measurements of the contamination events, as would be the case in an actual application. The parameters for the BED are given as follows: , , , . These tuning parameters are maintained intact for all the water quality parameters.

### Results

We present the results of our experiments using the aforementioned tuning parameters for each of the water quality time series. Through Figures 1⇓⇓⇓⇓–6: (a) shows the raw data; (b) presents the raw data and the incorporated contaminants; (c) provides the residuals obtained from the prediction results and in the presence of contaminations; (d), (e), (f), (g) depict respectively the results obtained with the SPRT, the CUSUM, the BED, and the OBCPD. The *y*-axes, for (d), (e), (f), (g) show respectively the computed sequential probability ratio, the computed CUSUM, the computed probability with the binomial distribution, and the computed run length.

#### Chlorine [mg/L]

The first result of this experiment is given by the residuals. As can be seen, the values of the residuals are low, and in the order of 0 for the mean, and this is valid for the presence or absence of contamination. Also, this result is valid when the chlorine measurement is in a steady state (constant mean) or transient stage (mean deviating from a constant value). The variance of the residual is also relatively low. This is an indication of the good performance of the forecast, where we analyzed approximately 9 days of data, but Figure 1 places emphasis on the interval (1.2 × 10^{5}) in order to illustrate the different scenarios. The presence of contamination is declared whenever the sequential probability ratio and the CUSUM cross their thresholds. The values of the tuning parameters that have been used are given in Tables 1 and 2. Throughout the figures, we observe that the four change-point detection algorithms react with different performances to the presence of contamination events. However, they also present a small delay to detections, due to the data buffering process necessary to construct the sequence of samples, in the context of SPRT and CUSUM. As this delay is mainly constant, it can be subtracted in real operations, if one is interested in determining the exact time at which the change-point occurred. As will be evidenced later, the OBCPD provides the best performance in terms of fewer missed detections and false positives for a certain category of contamination events. Moreover, the developed algorithms are able to detect contamination events, even when they occur simultaneously to the change in the background chlorine levels due to changes in network operation, except in some cases for the CUSUM test.

#### Conductivity [μS/cm]

During this experiment, we used the same tuning parameters as those used for chlorine, except for the the OBCPD, where the scale has been changed to and the hazard rate *H* to . Experiments with conductivity show that the four change-point detection algorithms provide different performances in terms of contamination event detection. Particularly, the results obtained with the SPRT and the CUSUM do not outperform those given by the BED and the OBCPD, where OBCPD provides the best performances in terms of detection. However, these results are sensitive to the amount of noise in the data (Figure 2). In Figure 3, we observe that the residuals exhibit a high level of noise, which is a consequence of the high level of noise in the conductivity measurement and in this interval of time, which impacts the change-point. In this high noise situation, all the change-point detection methods fail to provide good results, due to their difficulty in distinguishing normal changes from contamination events. However, after this interval of time, BED and OBCPD are able to detect the contaminations.

#### Total organic carbon [mg/L]

The experiments with TOC have been conducted with the tuning parameters indicated in Table 1. For OBCPD, the tuning parameters have been maintained intact as those used for chlorine except for the scale . The results presented here show a low level of noise in the computed residuals, which is also translated into the results of the change-point algorithms, where we observe that the four change-point algorithms react with good performances to the presence of contaminations (Figure 4). The large spikes in the background TOC (Figure 5, after 3.8 × 10^{5}s) create difficulties for all four change-point algorithms.

#### Turbidity [NTU]

The results obtained with the turbidity data (Figure 6) show the sensitivity of the change-point detection methods to the contaminations. Note that during this experiment we used the same tuning parameters as those used for the experiment with the chlorine. Turbidity, similar to TOC, has very low variance in residuals. This causes SPRT and CUSUM to typically identify two change-points, one at the start of event and one at the end. BED results continue to be relatively wide due to the longer time period chosen for the length of window *W*. OBCPD is the only method that only identifies a single change-point for each event.

### Performance evaluation

Here, we evaluate the performance of each of the presented change-point detection methods, by using the ROC curve. The ROC curve provides an efficient and comprehensive method to evaluate the accuracy of a change-point detection algorithm. Our metric for the performance evaluation is given by the FPR and the true positive rate (TPR); these two quantities allow us to determine the ROC curve. The FPR and FNR are respectively given bywhere,

True positive TP: occurs when the algorithm detects a change, and there is a change.

False positive FP: corresponds to situations where the algorithm detects a change, whereas there is no change.

False negative FN: occurs when the algorithm does not declare a change and there is a change, also known as missed detection.

True negative TN: the algorithm does not declare a change and there is no change.

The quantities above are calculated at every time step. Next, we maintain the previously presented tuning parameters except which is varied by a step of for high strength events and by a step of 0.25 for low strength events. For OBCPD, since the run length increases as a slope in the absence of contamination, and drops to zero in the opposite case, it cannot be used as such for the ROC computation. Therefore, we compute the derivative of the run length and use its absolute value for the ROC computation. Also, we use the absolute value of the SPRT to compute the associated ROC. Next, the notation ROC* _{x}* represents an ROC for a contamination strength equal to

*x*.

### High strength events

In this section we present ROC curves obtained for high strength events, and for both chlorine and turbidity.

#### Performance evaluation for chlorine

Table 3 presents the ROC for the chlorine measurement and shows that at a 5% FPR, OBCPD provides the highest TPR value compared to the other change-point detection algorithms. Also, the result obtained with OBCPD shows a TPR equal to 97% for a contamination strength higher than or equal to 7. We observe that the performances of the four algorithms improve as we increase the strength of the contamination. Figure 7 presents the ROC curves corresponding to the cases where the contamination strength is equal to 5 (left) and 8 (right).

#### Performance evaluation for turbidity

Similarly to Table 3, for the turbidity measurement we observe that at a 5% FPR, the OBCPD provides the highest TPR value compared with the other change-point detection algorithms. Figure 8 presents the ROC curves corresponding to the cases where the contamination strength is equal to 5 (left) and 8 (right). ROC results for turbidity are generally lower than for chlorine, especially for SPRT and CUSUM. This is due to two change-points being identified for each event in the lower noise data as discussed in the ‘Results’ section. There, the second change-point (end event) is counted as a false positive.

### Low strength events

Here, we focus on the cases where the contamination strength is lower than or equal to 1, and for both chlorine and turbidity.

#### Performance evaluation for chlorine

Figure 9 shows the results of the experiments where the previous tuning parameters are maintained. These ROC curves indicate that for a contamination strength lower than or equal to 0.75, the TPR becomes small, except for the results obtained with BED, for a FPR equal to 5% (Table 4). BED introduces a delay into the change-point detection which causes an increase in the number of false positive time steps and a slower rise in the ROC curve. We summarize the results of our experiments in Table 5. Figure 9 presents the ROC curves corresponding to the cases where the contamination strength is equal to 0.5 (left) and 1 (right).

#### Performance evaluation for turbidity

Here, we present the performance evaluation for the turbidity at 5% of FPR for low strength events, lower than or equal to 1. SPRT and CUSUM are again penalized for two change-points at every event. Figure 10 presents the ROC curves corresponding to the cases where the contamination strength is equal to 0.5 (left) and 1 (right). We summarize the results of our experiments in Table 6.

## CONCLUSIONS AND FUTURE WORK

This paper demonstrates the use of APA with an AR model for accurate predictions of water quality time series. We also introduce the use of OBCPD for water quality event detection. The OBCPD is compared with three other online algorithms using ROC calculations and demonstrates superior performance for events with large contamination strengths, whereas for smaller contamination strengths the best performances of the change-point detection methods are given by the frequentist methods. We observed, for example, for contamination strength equal to 0.75, the frequentist approaches, due to the fact that their tuning parameters calculated in the presence and in the absence of contamination are supposed to be known a priori, yield the best performance. The ROC curves associated with OBCPD, for the chlorine for instance, showed that the TPR is higher than 97% at the expense of a 5% FPR, and from a contamination strength equal to 7. This result is valid for the rest of the water quality time series examined here.

A reason for the good performances provided by OBCPD, for contamination strength overtaking 4, comes from its ability to combine Gaussian and Gamma distributions to efficiently predict the residuals of water quality time series data irrespective of their prior knowledge. A significant advantage of the OBCPD is that it is not necessary to set any threshold to detect changes. These two factors represent a crucial asset to water utility companies when designing online water quality monitoring approaches. Furthermore, our experiments demonstrated that the smallest detectable fault corresponds to the contamination strength being equal to 1 times the standard deviation of the recent data. This study highlights that water utility companies can consider each of the experimented algorithms, depending on the expected performances and the possibility to have access to the data a priori. For example, if the data are unavailable it is preferable to have recourse to OBCPD, whereas if the availability of the data allows determination of the descriptive statistics of the water quality time series a priori in the presence and in the absence of contamination, then the frequentist approaches represent the best alternatives. However, in practice, it is difficult to establish the characteristics of the contamination a priori. This suggests the use of OBCPD as the adequate solution.

In this paper, we considered that the structure of the contamination follows a Gaussian distribution with a relatively sharp change; however, situations where the change has a very slow variation represent a viable alternative. Therefore, improving the characteristics of the change-point detection algorithms to make them rapidly reactive to such changes is our future work.

- First received 13 November 2013.
- Accepted in revised form 9 June 2014.

- © IWA Publishing 2015

Sign-up for alerts