## Abstract

A pipe burst is a major water distribution system failure. Water escapes the network through the break increasing the total flow entering the network. These higher flows, in turn, increase the head losses in pipes and result in lower water pressures at customer taps. This study focuses on burst detection by seeking to identify anomalies in net system demand, pipe flow rates, and nodal pressure heads. Three univariate statistical process control (SPC) methods (the Western Electric Company rules, the cumulative sum (CUSUM) method, and the exponentially weighted moving average [EWMA]) and three multivariate SPC methods (Hotelling *T ^{2}* method and multivariate versions of CUSUM and EWMA) are compared with respect to their detection effectiveness and efficiency. First, the three univariate methods are tested using real system burst detection and then the six SPC methods are compared using synthetic data. The real application using net system demand shows that burst flows are proportionally too small to be detected. Synthetic data analyses suggest that the univariate EWMA method using nodal pressure provides the highest detectability. The method's long record length helps detect small bursts and avoid false detection. SPC methods require consistent system operations for measurements beyond total area flow.

- burst detection
- detectability
- resilience
- statistical process control (SPC)
- water distribution system (WDS)

## INTRODUCTION

Lansey (2012) defined resilience as a system's ability to ‘gracefully degrade and subsequently recover from’ a failure event. Resilience differs from other system performance measures, such as reliability and robustness, as it includes the pre-failure ability to reduce the failure severity and the post-failure ability to promptly return to a normal condition. Pipe bursts are the most common failure event in water distribution systems (WDS) and occur when a pipe ruptures from, e.g., pipe deterioration, excessive pressure, and ground shifts caused by temperature changes or earthquakes. The American Water Works Association reported that the annual average number of water main bursts is 270 in a WDS serving about 0.26M people (Folkman 2012; Philadelphia Water Department 2013).

A pipe burst results in water loss out of the system to the surrounding soil through the break in the pipe. Except in unusual cases, this flow results in higher pipe flow rates and head losses and lower pressures at withdrawal points. This condition may persist even after the break is isolated for repair. Therefore, pipe bursts result in the degradation of system functionality, interruptions of customer service, and possible collateral damage (e.g., sink holes). Resilience is the ability to overcome this failure. Thus, promptly detecting and locating bursts will decrease the failure duration and increase system resilience.

As noted above, pipe bursts cause a change in the pressure and flow rate signals that may be observable by supervisory control and data acquisition (SCADA) monitoring systems. Outstanding research issues are to identify: (1) under what burst conditions are signal changes significant enough to be detected; (2) what is the best method to detect bursts; and (3) where to locate monitoring devices to maximize the probability of detection. From an operator standpoint, burst detection is not the only indicator that is important and the objectives can be more precisely stated as maximizing burst detection in the shortest time while avoiding identifying false alarms. Here, we focus on comparing detection methodologies considering those three criteria.

### Previous studies

Among the SCADA-based burst detection methods that have been applied in the literature, including artificial neural networks (Mounce *et al*. 2002, 2003, 2010; Mounce & Machell 2006), state estimation (Andersen & Powell 2000; Ye & Fenner 2011, 2013), Bayesian approach (Poulakis *et al*. 2003), and time series modeling (Quevedo *et al*. 2010), statistical process control (SPC) methods are the most widely used (Misiunas *et al*. 2006; Romano *et al*. 2010, 2014; Palau *et al*. 2012; Jung *et al*. 2013a). SPC methods apply statistical theory to the system output parameters to identify non-random patterns that may be caused by bursts.

Since the early twentieth century, SPC methods have been used in quality control of manufacturing processes to detect defects in products (Montgomery 2010). SPC methods can be categorized based on two criteria: (1) the number of variables (locations) that are simultaneously considered to calculate an anomaly indicator (univariate or multivariate), and (2) the time window during which measurements are considered.

Romano *et al*. (2010) used the Western Electronic Company rules (WEC 1958) in a real-time leakage detection model using the measured pressure data. The WEC method plots the measured pressure values on a Shewart control chart to determine whether a single value or a series of values represents an anomaly based on several identification rules. To test the method, Romano *et al*. used pressure data from 13 sensors for five burst events induced by opening hydrants at different locations and times. Romano *et al*. (2014) developed a new event detection methodology that combines a range of different artificial intelligence (AI) and statistical techniques (i.e., WEC and Bayesian approach). The method has been applied to pressure and flow data recorded by the sensors deployed in a district metered area (DMA) under simulated and real events. In the analysis of real events during 11 months, 29 out of 30 events (estimated from main repair records and customer contacts, etc.) were detected by the method with 8% false alarm rate.

Misiunas *et al*. (2006) used the cumulative sum (CUSUM) method to detect and locate WDS bursts. They applied CUSUM to a six-month hourly flow record at the entry point of a relatively small network. Five different bursts were generated to test the method. Analyzing DMA inflow rates, Palau *et al*. (2012) applied the Hotelling *T ^{2}* method for burst detection. Here, the Mahalanobis distance (

*T*) was calculated after performing principal component analysis (PCA) to decrease data dimensionality.

^{2}While many different types of SPC method have been examined for burst detection, no cross comparisons of these methods have been completed using a common data set. Furthermore, although detection probability (*DP*) (i.e., percentage of actual bursts that were identified) is an important metric, other indicators such as time to detect a burst and the false alarm rate should also be examined as focusing on calibrating a method for *DP* may have negative consequences on other indicators.

An outlier is a measurement that is statistically different from the rest of the data and should be removed to obtain sound data during system monitoring. Some work has been conducted on these issues in the water domain. Robinson *et al*. (2005) applied two univariate and two multivariate methods to detect outliers in water quality data from a wastewater treatment plant. An outlier is often caused by measurement errors while an anomaly can be an outlier or non-random pattern (i.e., an out of control system). The univariate methods applied by Robinson *et al*. were the z-score approach and Walsh's method (1950, 1958). The z-score approach is quite similar to the WEC rules applied by Romano *et al*. (2010, 2014) in that it uses the three sigma control limit (CL) to identify an outlier. Walsh's method looks for unexpected gaps in the data.

The Hotelling *T ^{2}* method and its modified version suggested in Hadi (1992, 1994) and Hadi & Son (1998) are other multivariate methods applied for outlier detection in water quality data. The modified Hotelling

*T*differs from the standard method by excluding outliers in the test statistic when examining if they are outliers. Robinson

^{2}*et al*. (2005) came to the general conclusions that multivariate methods are more effective in identifying outliers while univariate methods have more chance of erroneous deletion of valid data. However, it is not clear that water quality and hydraulic conditions behave similarly.

This study compares three univariate and three multivariate SPC methods with respect to three performance metrics: *DP*, time to detect (efficiency), and false alarm rate. The univariate SPC methods are the WEC rules, the CUSUM and the exponentially weighted moving average (EWMA) method. The three multivariate methods are the Hotelling *T ^{2}* method and the multivariate versions of CUSUM and EWMA. First, the performance metrics of the three univariate methods are computed for net system demands measured in a real network during winter months. Then, to test the six methods, nodal pressures and pipe flow rates were randomly generated using two real networks' hydraulic model and the performance metrics are computed for alternative flow and pressure meters' configurations for conditions with and without simulated bursts.

## DETECTABILITY

Detectability of a burst detection method is a combination of its detection effectiveness and its detection efficiency. Detection effectiveness is related to how well burst events are detected and false alarms in natural random patterns are avoided. Therefore, detection effectiveness is measured by the *DP* and the false alarm rate. *DP* is the proportion of burst events that were detected (*N _{d}*) among the total number of burst events (

*N*) 1The rate of false alarm (

_{tb}*RF*) is the average number of false alarms issued per day in a natural random sequence of events without bursts. Therefore,

*DP*is related to false negative (type II error) rate while

*RF*indicates false positive (type I error) rate.

Detection efficiency, on the other hand, evaluates how fast a burst event is detected. The average detection time (*ADT*, hr), the averaged value of the time to be taken for detection, is used here as the detection efficiency indicator
2where *t _{di}* is the time (hr) of detection for the

*i*th detected burst event and

*t*is the time (hr) of occurrence for the

_{bi}*i*th detected burst event. Note that

*t*of burst is generally not known so this statistic is tested using synthetic data only. Therefore, it is desirable to maximize

_{bi}*DP*and minimize

*RF*and

*ADT*in terms of improving system resilience.

## METHODOLOGY

In quality engineering terminology (Montgomery 2009), the measurable quality parameters (pipe flow and pressure in WDS) are known as quality characteristics (variables). The set of quality characteristics considered simultaneously in WDS burst detection can include either system output or measurements of the same output at different locations. The goal of each SPC method is to identify a suspected change in the process output's mean value (i.e., a change in the quality characteristic) or the so-called mean shift.

The basis of SPC methods is the Shewart control chart that is a plot of the mean values (centerline) of the process variable, thresholds on the expected upper and lower ranges of the variable, and warning limits (WLs) that are multiples of the standard deviation (*σ*) on each side of the mean value. The chart implicitly assumes that the process variable is normally distributed (Surendran *et al*. 2005; Jung *et al*. 2013b).

SPC methods are classified by the number of variables simultaneously considered: univariate versus multivariate methods. A univariate method utilizes the measured value of each quality characteristic independently. If multiple quality characteristics (e.g., the flow measurements from multiple meter locations) are provided to a univariate method, the control chart for each quality characteristic is considered separately. A non-random pattern in one meter is sufficient to issue an alarm.

A multivariate method, on the other hand, converts the measured values of multiple quality characteristics into a single anomaly indicator generally by considering correlation between multiple variables under normal conditions. Simultaneous monitoring in multivariate methods can improve anomaly detection while monitoring multiple quality characteristics independently may be misleading. For example, the measurements that seem normal at individual meters may be abnormal based on the correlation of the quality characteristics. Therefore, generally, multivariate methods are more effective for anomaly detection than univariate methods (Montgomery 2009). The following sections describe normalization, the six SPC methods' algorithms, the anomaly indicators, and CLs as applied in this research.

### Normalization of quality characteristics

Quality characteristic values in most manufacturing processes have steady mean and standard deviation. However, the mean and standard deviation of the WDS hydraulic quality characteristic values (pipe flows and nodal pressure heads) normally vary over diurnal pattern (Figure 1). Therefore, we normalize the measured value of each quality characteristic
3where *z _{i}* is the normalized quality characteristic value (also known as the standard score) at the

*i*th time period;

*x*is the measured quality characteristic value at the

_{i}*i*th time period; and and are the mean and standard deviation of the quality characteristic's value at the

*i*th time period, respectively. Note that the Shewart control chart of

*z*now consists of the constant centerline (with mean equal to zero) and thresholds, instead of time varying values.

_{i}### Univariate methods

#### Western Electric Company (WEC) rules

The WEC rules (1958) are a set of decision rules for detecting non-random patterns in measured data (Montgomery 2009). The WEC rules are based on the Shewart control chart and identify a non-random pattern if the standard score (*z _{i}*) satisfies any of the following criteria:

Any single standard score is beyond the

*σ*CLs.Two of three consecutive standard scores are beyond

*σ*WLs.Four of five consecutive standard scores are beyond

*σ*WLs.Eight consecutive standard scores are beyond the

*σ*WLs.

The rules are applied to the one side of the centerline (*z _{i}* = 0) at a time. Therefore, a score immediately followed by the score on the other side of the centerline outside the WLs will not be taken into account as a non-random pattern. WEC rules consider, at most, the eight past measurements. This history is the shortest record among the three univariate SPC methods.

#### Cumulative sum (CUSUM) control chart

CUSUM directly incorporates past data. The difference between the quality characteristics and the user-defined reference values, *K _{u}*, is accumulated over time in two terms
4
5where and are one-side upper and lower CUSUM, respectively, and ; is a user-defined reference value. If either or exceed the user-defined decision interval, , the process is considered to be out of control. The two parameters, and , must be estimated for the specific network to apply this method.

#### Exponentially weighted moving average (EWMA) control chart

The EWMA method also utilizes past and present measured values by calculating the exponentially weighted moving average as
6where is the exponentially weighted moving average at the *i*th time period with ; is a weighting factor and 0 ≤ ≤ 1. The EWMA control chart gives less weight to measurements in the more distant past and more weight to recent data.

The EWMA CLs change over time with the variance of measurement's moving average, , that is estimated by
7The derivation of the above equation is based on the sum of a geometric series (Montgomery 2009). Because the term approaches 1 as *i* becomes large, the EWMA upper and lower CLs converge to
8and
9respectively. and *L* are the user-set parameters defining the weight on past data and the failure threshold, respectively. Although these parameters must be tailored to the specific application, CUSUM and EWMA have been shown to be effective when small but persistent mean shifts, which are expected for small pipe bursts, are of interest (Hawkins & Olwell 1998). Note that EWMA has the longest memory of past data of the three univariate methods because it incorporates all past measured data.

### Multivariate methods

#### Hotelling *T*^{2} control chart

The Hotelling *T ^{2}* control chart uses the Mahalanobis distance for multiple quality characteristics to identify an out of control system, that is, a burst. The Mahalanobis distance is the squared standardized distance from the center of multivariate normal distribution. The Hotelling

*T*control chart issues an alarm if the distance is larger than a CL determined by a user-defined acceptable level. The Mahalanobis distance at

^{2}*i*th time period is calculated as 10where is the vector of the normalized quality characteristics of the

*p*measurements at time

*i*or ; and is the normalized measurement covariance matrix.

If *z _{1}*, z

*, …, z*

_{2}*are normally and independently distributed, the random variable*

_{p}*T*

_{i}^{2}follows a chi-square distribution with

*p*degrees of freedom. If the historical record is long for each measurement (i.e., >100), the chi-square limit can be applied 11where is the user-set type I error probability, which results in 100(1 − )% confidence level. The upper CL is a control ellipse (Figure 2) of points equidistant from the center, i.e., the mean of multivariate normal distribution. If the Mahalanobis distance exceeds the

*UCL*at time step

*i*, the system is said to be out of control. Conventional Hotelling

*T*considers one control ellipse at each independent time step. Here, to extend the memory of past data and flexibility in identifying out of control systems, we applied WEC type rules to the Hotelling

^{2}*T*statistics in Hotelling

^{2}*T*/WEC.

^{2}#### Multivariate CUSUM (MCUSUM) control chart

The MCUSUM control chart (Crosier 1988) incorporates the Mahalanobis distances of the past measurements in a CUSUM. Thus, it has the advantage of considering multiple quality characteristics and past information when judging if a system is out of control. The MCUSUM accumulated deviation is calculated as
12where is the multivariate CUSUM to period *i*, and *K _{m}* is a user-defined reference value. If exceeds the defined decision interval,

*H*, the process is considered to be non-random pattern. Note that unlike CUSUM, a is considered because is non-negative.

_{m}*K*and

_{m}*H*must be estimated for a given process.

_{m}#### Multivariate EWMA (MEWMA) control chart

MEWMA (Lowry *et al*. 1992) is a multivariate extension of the univariate EWMA. MEWMA first calculates the exponentially weighted moving average of each quality characteristic; applying a scalar weighting factor, , to all quality characteristics. The matrix of exponentially weighted moving average of multiple variables is then calculated as
13where and .

The Mahalanobis distance of the moving averages of multiple variables is then determined by
14where the covariance matrix of the moving average, , is computed from the covariance matrix of normalized measurements (**Σ**) as
15

MEWMA then applies the same anomaly identification approach as Hotelling *T ^{2}* method with

*UCL*defined by Equation (11). must be estimated for the specific system when applying MEWMA.

## APPLICATION

### Rhine network

First, the three univariate SPC methods are tested to detect bursts from a service area in the western part of the Netherlands (Rhine network). The area's net system demands during three months of winter in the last 5 years (i.e., January, February, and December in 2008 to 2012) are used as test data. The demands are calculated from water balance given the flows into and out of the area that are measured at every 5 minutes by the area's water provider, the Dunea Company. Demands follow a typical diurnal pattern with total system mean demand of 636 lps (Figure 3(a)). The Dunea Company has recorded the time and location of 93 reported bursts during the 452-day study period. More details about the system are found in Bakker *et al*. (2013).

To develop the control charts, net system demands of 359 (=452 − 93) normal days were used as historical data and time-varying means and standard deviations of the net demand were computed (Figure 3(a)). To calculate the *RF*, the 359 normal days' net system demands were again provided to the methods. Net system demands of 93 burst days were provided to estimate the *DP*. Since we do not know the occurrence time of the bursts, the *ADT* was not calculated for the first case study. Instead, the reference time that is defined as the detection minus report time of bursts is given as reference.

The following assumptions and simplification are made in this case study: (1) leakages and minor bursts that were not recorded by Dunea Company are negligible; (2) meter measurement error is negligible; (3) a burst that is not detected within the reported day is considered as a non-detected event; and (4) *C ^{+}* and

*C*of CUSUM and

^{−}*MA*of EWMA are set to zero at the end of February in each study year (i.e., February 29 2008 and 2012, February 28 2009, 2010, and 2011) and the midnight of each day; (5) when applying methods to the control sample, the statistics listed in (4) are not reset after a false alarm; (6) information on known extreme events such as big sports' events and extreme weather condition is not available; and (7) no other information is available to detect the bursts, that is, visible damage (e.g., sink hole and water emerging from the ground) or customer complaints that normally supplement measurements and burst recognition.

### Austin and Apulian network: Synthetic burst detection

Here, all SPC methods are tested to detect synthetic bursts generated from two real networks' hydraulic models. The Apulian network (Figure 4) from Giustolisi *et al*. (2009) and the Austin network (Figure 5) with several modifications from Brion & Mays (1991) are used as test systems.

While the Apulian network (Figure 4) is a relatively small looped system with 11 loops, the Austin network is a branch dominated system with looped subareas (Figure 5). The mean total Apulian system demand is 282 lps while the Austin system has a similar total mean demand (726 lps) as the Rhine network. For the Apulian network, we performed the least cost pipe design as the deterministic phase optimization described in Giustolisi *et al*. (2009) but with the reservoir elevation of 51.4 m. The modified Austin system consists of 126 nodes, one source, one pumping station, and 90 pipes (Figure 5). The tank in the original system is removed and the pump station flows from a constant head source.

To investigate the impact of the number of meters on detectability, data from one to five meters were generated and provided to the burst detection methods (Tables 1 and 2). In the Apulian network, the meters are located to cover different loops in the system (Figure 4). When fewer meters are available, they are located closer to the source. For the Austin network, more complex rules are applied to locate the meters in a systematic way. The meter locations (Figure 5) were selected by the authors with the rules: (1) the first meter is located in the main transmission line; (2) the second and third meters are placed in branches off the main transmission line (northwest or southeast); and (3) the last two meters are located to provide information on the remainder of the system (i.e., the area just downstream of the pump station) or local areas in the northwest and southeast. In both networks, flow and pressure meters with the same identifier should be located in close proximity for a consistent comparison of detectability between meter types. Although we have attempted to systematically account for meter locations, optimally locating meters may improve the detectability one method over another in this comparison.

To compare the SPC methods, long time sequences of nodal pressures and pipe flow rates were generated using the networks' hydraulic models. The random input is the nodal demands following a normal distribution. Nodal demands were assumed to be spatially uncorrelated and each had a coefficient of variation of 0.1. The demands have strong temporal correlation due to the diurnal pattern.

Three time data sets were generated using EPANET (Rossman 2000). The first set, a normal 2,000-day series of nodal pressure and pipe flow rates at a 5-minute time step, was generated as historical data. These data were used to compute the covariance matrix for the multivariate SPC methods and the mean and standard deviation of the quality characteristics. Therefore, this study is described as a phase-II control chart application because the statistics for SPC method were obtained from a clean set of process data gathered under stable conditions.

The other data sets were: (1) a control sample, and (2) the failure set. The control sample, like the historical data set, only considers demand randomness. The failure set included demand variability and pipe bursts that were modeled as emitters in EPANET. The random burst characteristics were its location, initiation time, and magnitude that were defined by the emitter coefficient, *C.* For the Austin network, *C* was assumed uniformly distributed over the range of 1 to 50 resulting in burst magnitudes equal to 0.1 to 3.3% of the total system mean demand. The bursts with magnitude equal to 0.3 to 10.6% of total system mean demand (*C* = 0.1 to 5) were generated in the Apulian network. Events detected in the control sample are false alarms and counted to calculate *RF*. *DP* and *ADT* for detected events were derived from the failure set.

A number of assumptions and simplification are made in this hypothetical case study: (1) the hydraulic model perfectly represents the real system; (2) the pipe roughness coefficients and other system parameters are known with certainty; (3) a burst that is not detected within 48 hours after its occurrence is considered as a non-detected event; (4) only one burst occurs in each 48 hours; (5) *C ^{+}* and

*C*of CUSUM,

^{−}*MA*of EWMA,

*MC*of MCUSUM, and

^{+}*MMA*of MEWMA are set to zero at the time of burst to eliminate bias among methods and be most conservative in detecting a burst; and (6) when applying methods to the control sample, the statistics listed in (5) are not reset after a false alarm. Assumptions (2) and (7) identified in the real-life burst case study are also applied here.

## APPLICATION RESULTS

### Real-life burst detection

To develop the control charts for three univarate SPC methods, time-varying means and standard deviation of net system demand were calculated from the 359 normal days' data set and normalized using Equation (3). Among the 93 bursts during the study periods, the detection information of nine bursts with various burst times, magnitudes, and locations is summarized in Table 3. *RF* was calculated by applying the 359 normal days' net system demand to the methods. WEC with two different memory sizes and CUSUM and EWMA with three parameter sets are applied to detect the bursts. Because the exact occurrence time of burst is not known, the reference hour was calculated by subtracting the report time of burst with the detection time.

WEC using only the 4*σ* rule detected five out of the selected nine bursts and had 1.39 false alarms per day. Adding more sub-rules and having short memory of past data resulted in detecting two more events but increased *RF* significantly (17.9 false alarms per day). During bursts 4, 6, and 8, anomalies in net system demand were detected earlier in WEC using four sub-rules than using one sub-rule.

CUSUM and EWMA using three parameter sets were tested while the same decision interval and *CL*, respectively, were applied. If *K _{u}* is small, the CUSUM becomes sensitive to small deviations from the mean value of net system demand. As shown in Table 3, as

*K*decreases, CUSUM detected more bursts but

_{u}*RF*also increased. While having the same

*DP*, CUSUM using

*K*of 2.4 and

_{u}*H*of 1.0 resulted in less than half of false alarms (7.18 false alarms per day) than WEC using four sub-rules. EWMA had the highest detectability when lower weight was given to past data.

_{u}Although *DP* is above 55% for the selected 9 bursts, *DP* is less than 10% for all three methods for all 93 bursts. Most bursts' flow rates are proportionally too small to detect and remain a short-term signal on net system demand (Figure 3(a)). For example, neither the CUSUM nor the moving average increased gradually before detection; rather, each rapidly spiked (Figure 3(b)). Therefore, the benefit of using a long memory of past data was not exploited. On the other hand, the benefit was evident in avoiding false alarms.

An alternative to enhance detectability is to use measurements of nodal pressures and pipe flows that are more sensitive to bursts than net system demand (i.e., bulk flow measurement). However, in the Rhine network, the measurements of the two quality characteristics cannot be used for SPC methods because the network does not operate consistently. Flow control valves and variable speed pumps change their setting frequently thus the boundary of the two quality characteristics changes. Standard SPC methods do not have a mechanism to consider the boundary changes in burst detection thus will likely issue false alarms. Therefore, in such a system, standard SPC methods are limited to evaluating bulk flow measurements and likely large bursts.

### Synthetic burst detection

#### Verification of statistical assumptions

In case studies with synthetic data, both the Austin and Apulian networks are assumed to have consistent operations so in-network flow and pressure measurements can be used for burst detection. The long series of flow and pressure data was generated as historical data at multiple meter locations. To develop the Shewart and other control charts, time-varying means and standard deviations of the flow and pressure quality characteristics were calculated from the historical data set for all meter locations and normalized using Equation (3). As seen in Figure 1, in both networks pressure has wide CLs when demands and head losses are high and tighter limits during low-flow periods at night while the width of the CLs for pipe flow rates are relatively constant throughout the day. Correlation coefficients and the covariance matrix of the normalized flow and pressure values for the two networks were computed for multivariate methods (Tables 4 and 5).

To estimate the upper CL for the multivariate methods, flow and pressure data were examined to determine whether they follow multivariate normality. If the random variables do follow normality, the Mahalanobis distances follow the chi-square distribution. As such, histograms of the Mahalanobis distances were plotted with theoretical chi-square distributions with the same degrees of freedom (Figure 6). For all degrees of freedom, the chi-square distribution closely fits the Mahalanobis distance histograms. The control ellipse for the multivariate methods (Hotelling *T ^{2}* method and MEWMA) was then drawn using the chi-square statistics corresponding to the four-sigma confidence interval (0.9999366) in a univariate normal distribution. The Hotelling

*T*statistic was also examined with respect to the applied WEC rules with discrete WLs (from one to four sigma limits) in the multi-dimensional space to extend the memory of past data (Hotelling

^{2}*T*/WEC).

^{2}Using judgment and trial and error, the parameters for univariate and multivariate CUSUM and EWMA were estimated to have the lowest false alarm rate while maximizing *DP* (Tables 6 and 7). To calibrate the models, two time series of flow and pressure data were randomly generated with and without introducing bursts. Model parameters applicable to each method were discretized at increments appropriate for each parameter based on its sensitivity (Tables 6 and 7). Detection and false alarm rates were calculated for parameter combinations and the best set was used in the method comparisons described below. Univariate method parameters were estimated independently for each meter and similar parameter values were identified for all locations (Tables 6 and 7). Multivariate methods' parameters were estimated for each set of meters. For consistency in this comparison, methods to decrease the dimension of data, such as PCA that was applied in Palau *et al*. (2012), were not used here.

### Comparison detection effectiveness of six SPC methods

Figures 7 and 8 show the detection effectiveness of the six SPC methods using flow meters and pressure meters in Austin and Apulian networks, respectively.

#### Austin versus Apulian network

The Austin network has a larger total system mean demand (726 lps) compared to the Apulian network. The former has a branched system with looped subarea while the latter is a fully looped system. Therefore, a burst occurring in the Apulian network affects the nodal pressure and pipe flow in all of the network, making burst detection easier. The bursts considered in the Apulian network were equal to 0.3 to 10.6% of total system mean demand (282 lps) that is larger than the proportion of the bursts generated in the Austin network (0.1 to 3.3%). As a result, overall detection effectiveness (especially, *DP*) was higher for the Apulian network compared to the Austin system.

The detection effectiveness of the six SPC methods shows consistent results in both networks. Those will be described in more detail in the following sections. In both networks, as more meters were included in the detection analysis, more bursts were detected. However, in some methods the false alarm rate also increased. For example, meters are examined independently in WEC rules so more false alarms were identified with more meters.

#### Univariate versus multivariate methods

Generally, the univariate methods outperformed the comparable multivariate schemes in terms of overall detection effectiveness (Figures 7 and 8). CUSUM and EWMA have the highest *DP* and the lowest false alarm rate among the six SPC methods. Their multivariate counterpart *DP*s were more comparable to WEC but reacted differently as additional meters were installed.

Considering the correlation between multiple pressure meters increased false alarms. As shown in Table 4, the correlation coefficient between pressure meters in the Austin network is close to 1 (e.g., the correlation between *z _{1}* and

*z*is 0.999), indicating one meter's value can represent the other four meters' value almost perfectly. This high correlation results in thin control ellipse that will be exceeded with relatively small Mahalanobis distances. The result is a high detection efficiency and many false alarms. The pressure meters in the Apulian network have correlation coefficients ranging between 0.284 and 0.698 (Table 5). This level of correlation also causes many false alarms.

_{2}However, considering the correlation between flow measurements from multiple locations decreased false detections in both networks (e.g., WEC versus Hotelling *T ^{2}*/WEC methods). In this case, the correlation between flow measurements varies from 0.000 to 0.644 (Table 4) in the Austin network and from −0.042 to 0.466 (Table 5) in the Apulian network resulting in a wide control ellipse and fewer false alarms and burst detections.

#### Flow versus pressure meter

Overall, pressure meters performed better than flow meters in terms of burst detection (Figures 7(b) and 8(b)). For CUSUM, EWMA, and MCUSUM, the same number of pressure meters detected more events with fewer false alarms compared to flow meters. For the synthetic data, pressure measurements are more consistent and less sensitive to demand variability (Figure 9(a)) compared to flow meters (Figure 9(b)). A non-random pattern is more apparent for pressure meters after a burst (Figure 9(a)). Flow measurements are less likely to raise an alarm for smaller bursts that are detected from information gathered over time. On the other hand, if the burst is large, a flow meter may be able to identify it from a single or a few out of control measurements.

### Comparison of detection efficiency of six SPC methods

Detection efficiencies for flow and pressure meters are presented in Figures 10 and 11, respectively. The bursts were detected earlier in the Apulian network than the Austin network. As more meters were used, the detection time sharply decreased in Apulian because of the small network size.

On average, and for both networks, univariate methods needed less time than multivariate techniques to identify bursts. EWMA had the shortest *ADT* of the univariate methods regardless of the meter type used. Univariate methods' *ADT*s decrease as the model memory (the record length in the analysis) increases (EWMA < CUSUM < WEC).

As might be expected, large bursts were detected more rapidly than the small bursts. For example, in the Austin network, the largest 25 bursts (2.38 (17.3 lps) to 3.27% (23.8 lps) of total system demand) resulted in *ADT* of 1.7 to 3.2 hours when measurements from two pressure meters were provided to the univariate methods. On the other hand, the smallest 25 bursts (0.13 (0.9 lps) to 0.71% (5.2 lps) of total system demand) had *ADT* of 16.6 to 29.4 hours for those conditions.

### Impact of measurement memory on detectability

As seen in Figures 7 and 8, methods with longer memories outperformed those techniques that judged events from a short data series (i.e., WEC and the two Hotelling *T ^{2}* methods). In many cases, the magnitude of a burst's disturbance is inadequate to sufficiently change the system state in the short term but its persistence over time does provide an ample signal. Longer records and the ability to consider persistent changes in measurements help to detect small bursts and avoid false alarms caused by short-term outliers.

With their longer memories, the univariate methods CUSUM and EWMA have high *DP*s (54–89% and 91–98% using flow and pressure meters, respectively) and very low *RFs*. The inherent randomness in the quality characteristics resulted in high false alarm rates when WEC and the two Hotelling *T ^{2}* methods were applied since a single outlier can trigger an alarm.

CUSUM and EWMA outperformed WEC and the multivariate methods because they can detect small mean shifts. The advantage of CUSUM and EWMA is confirmed in Table 8, which summarizes detection results for six SPC methods for the smallest 26 Austin burst events when three pressure meters are used. The mean burst rates are between 0.13 and 0.72% of total system demand. CUSUM and EWMA detected 65.4% (17 out of 26) and 69.2% (18 out of 26) of these bursts, respectively, compared to 11.5% (3 out of 26) to 34.6% (9 out of 26) for the other four methods. CUSUM is the univariate method with long memory of past data, thus can detect small bursts by gathering persistent changes in measurements. Hotelling *T ^{2}* method has no memory and is less sensitive to small bursts than CUSUM.

#### How measurement memory affects burst detection

Closer examination of the methods provides insights into how the record length considered by alternative methods influences *DP* and *RF*. Figure 12 shows normalized flows in pipes 16 and 61 in the Austin network (Figure 12(a)) when a burst occurs at time step 0. The pipe flow rate increased with the total system flow when the break occurred. The altered flow condition can be seen by visual inspection of the time series and confirmed by counting the number of measurements above the centerline. The mean shift in flow measurements causes a steady rise in the upper CUSUMs at both locations (Figure 12(b)) and an alarm was issued by CUSUM after 11.7 hours (140 time steps × 5-min time step) based on pipe 16's meter. However, the Hotelling *T ^{2}* method is not sensitive to small quality characteristic changes and does not detect the burst. Many measurement pairs were located on the upper right side of the center of multivariate normal distribution (Figure 13), showing a non-random pattern but persistence is not part of the detection rules.

#### How measurement memory helps to avoid false alarms

A longer record length also provides information to avoid false alarms as methods with short memories are vulnerable to natural outliers. Figure 14 presents the normalized flow measurement at pipe 16 and 61 in the Austin network for a period of no bursts. In this case, the normalized measurements were symmetrically distributed around the centerline (Figure 14(a)). The resulting upper or lower CUSUMs rise for short periods due to these flow variations but return to zero (Figure 14(b)) and no alarms are sounded.

In Hotelling *T ^{2}* (Figure 15) at time step 182, a pair of measurements significantly differ from the typical correlation for these locations, resulting in a large Mahalanobis distance and a false alarm. Here, the flow measurement at pipe 61 was about 2 standard deviations above its mean value while the value at pipe 16 was 4 below its mean (Figure 14(a)) that is atypical compared to the small positive correlation ordinarily observed.

#### Impact of *T*^{2} on MCUSUM and MEWMA

Although MCUSUM and MEWMA have longer memories of past data than the Hotelling *T ^{2}* method, neither outperform the two Hotelling

*T*methods. As noted,

^{2}*T*statistics are insensitive to small bursts and vulnerable to outliers. To avoid false alarms due to outliers, our calibration approach set the decision interval,

^{2}*H*, for MCUSUM relatively high, resulting in lower

_{m}*DP*and

*RF*in the calibration set and no false alarms in the control sample. Similarly, the optimal is between 0.7 and 0.95 (Tables 6 and 7) that weighs more heavily on the most recent measurements (Equation (13)). Therefore, MEWMA's detectability is similar to the Hotelling

*T*methods.

^{2}Figure 16 shows anomaly indicator values of the multivariate methods for a portion of the control sample used for the Austin network. A pair of pressure measurements near time step 251 were significantly different from the typical correlation and resulted in the Mahalanobis distance (*T ^{2}*) of 22.3, which is beyond the

*T*CL (=19.31). This large distance, in turn, increased the multivariate CUSUM and the exponentially weighted moving average. The optimal MCUSUM decision interval (

^{2}*H*34.5) was sufficiently large that no alarm was issued. However, Hotelling

_{m}=*T*and MEWMA method sounded false alarms.

^{2}## SUMMARY AND CONCLUSIONS

Early and accurate WDS burst detection improves system resilience. To that end, this study compares the ability of six SPC methods to identify bursts. First, the three univariate methods were applied to detect bursts using the real net system demand measurements for the Rhine network in the Netherlands. The burst flows were proportionally too small compared to the net system demand to be detected and have a short-term signal. However, it is confirmed that the long record lengths applied in CUSUM and EWMA help in avoiding false detections.

To demonstrate the benefit of using multiple meters that are more sensitive than the bulk flow measurement and test all six SPC methods, synthetic flow and pressure sequences from two networks, Austin and Apulian, were used to develop and calibrate the methods (i.e., WEC, CUSUM, EWMA, Hotelling *T ^{2}*, MCUSUM, and MEWMA).

*DP*and

*ADT*were then calculated from a random set of burst events and the false alarm rates were estimated from a sequence that only includes natural randomness.

Results indicate that system pressures are more sensitive to disturbances and provide higher detectability compared to flow meters because of more consistent non-random pressure patterns that occur after a modeled burst. However, in multivariate schemes pressure measurements caused relatively high false alarm rates due to the thin control ellipse resulting from high measurement correlation.

Univariate methods that incorporate past and present data performed better than univariate techniques that only consider short measurement histories and multivariate methods. In particular, the latter methods did not recognize small bursts that were detected by CUSUM and EWMA. While a long record length helps in detecting small bursts and, as in the real application, avoiding false detections in univariate methods, their value is not exploited in multivariate methods (Hotelling *T ^{2}* method versus MCUSUM and MEWMA) because the methods are susceptible to natural outliers in the

*T*statistic. Adjustments in the MCUSUM and MEWMA model parameters to be less sensitive to the outliers reduces their sensitivity to identifying bursts and satisfactory balances could not be determined.

^{2}Overall, the univariate exponentially weighted moving average (EWMA) had the best efficiency and shortest *ADT* among the six SPC methods. Therefore, based on the networks considered in this study, EWMA is the preferred burst detection method and pressure measurements are more valuable for identifying bursts compared to net system demand and flow meter data.

This study has several limitations that future research must address. This study utilizes flow and pressure meters independently. Since pressure values from multiple locations may be correlated, the combination of flow and pressure data should be assessed using synthetically generated and real system measurements. In addition, a multiple objective optimal meter location problem should be formally posed to maximize *DP* and minimize the false alarm rate and *ADT*. Finally, and most importantly, SPC methods require consistent system operations for measurements beyond total area flow. This limitation is significant as pump, tank, and valve operations often vary day to day. These changes result in different distributions of measured values depending upon current and possibly recent past operations. Alternative methods that account for the system state should be pursued.

## ACKNOWLEDGEMENTS

This material is based in part upon work supported by the National Science Foundation under grant no. 0835930. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

- First received 11 September 2013.
- Accepted in revised form 7 September 2014.

- © IWA Publishing 2015

Sign-up for alerts