## Abstract

Because pipeline systems represent more than 80% of the total asset value of water-distribution systems, their management is an important issue for water utilities. A pipeline deteriorates over time after installation and, along with the deterioration, pipe bursts can occur as various types, and the choice of a maintenance and repair strategy will depend on the burst types. It is therefore important to forecast the occurrence probability of each burst type. This paper addresses a competing deterioration-hazard model that allows modelling of deterioration by multiple types of failure and focuses on the bursts which occur in the pipe body or connection. The Weibull hazard model is used to address the lifetime of each pipeline, measured from when it was buried, and the model takes into account the competing nature of various types of failure by using a competing hazard model. The competing deterioration-hazard model allows us to determine the probability of deterioration in the pipe body and connection. The model is estimated by Bayesian inference using a Markov chain Monte Carlo method. The applicability of the method to data for an existing pipeline system is examined.

- asset management
- Bayesian inference
- competing deterioration-hazard model
- pipe burst
- pipeline systems

## INTRODUCTION

Water-supply pipelines, which form important components of the infrastructure of cities, require huge annual maintenance budgets. Consequently, the establishment of optimal regimes for maintaining water pipeline systems has become a major issue for water-utility managers throughout the world. In the management of infrastructure assets, optimal maintenance strategies are frequently based on lifecycle-cost analysis, which is dependent on the deterioration model (Kobayashi *et al.* 2010).

In the field of water-supply systems, many studies have been conducted to assess the condition of pipeline systems and to predict their deterioration process. Shamir & Howard (1979) and Marvin (1996) assumed that breaks in pipelines increase exponentially with their age, and they obtained break-prediction models by using regression analysis. Clark *et al.* (1982) reported a method for estimating the expected failure time of pipelines, whereas Shinstine (1999) examined the relationship between pipeline breaks and the diameters of pipes. Because water pipeline systems are usually buried underground, the monitoring and inspection of such systems is difficult and it is hard to accumulate adequate observational data for use as a basis for deterioration forecasting analysis. Because of the difficulties in observing deteriorations of pipelines directly, we decided to predict the deterioration of pipelines by examining failures caused by the deterioration process.

Marks (1985), Constantine & Darroch (1995), and Park (2004) used proportional hazards models, based on the failure-prediction model proposed by Cox (1972), to predict the risk of a pipeline break. Many probabilistic models that use various probability functions have been developed with the average annual number of pipe breaks on the pipeline systems as an indicator of the structural state and the times to failure between pipe breaks considered as random variables (Le Gat & Eisenbeis 2000; Mailhot *et al.* 2000, 2003; Pelletier *et al.* 2003). These models have overcome challenges that observation data typically show properties, right censored observations (Eisenbeis *et al.* 1999; Mailhot *et al.* 2000), left truncation (Mailhot *et al.* 2000) and selective survival bias (Scheidegger *et al.* 2013). By setting the deterioration state as a binary condition, ‘failure’ or ‘normal operation’, it is possible to predict the service life of a pipeline by using a conventional hazard model. There are numerous reports of studies in which this type of deterioration prediction method has been applied to other types of system. Aoki *et al.* (2005) proposed a method in which a Weibull hazard model is used to predict the lifetime of tunnel lighting equipment. Tanaka *et al.* (2010) similarly used Weibull hazard models to predict the deterioration of pipelines.

In general, the major cause of interruption of water pipeline systems is deterioration of the pipes. In the conventional models for the prediction of pipeline deterioration prediction, the type of pipe failure is not classified, and all failures are considered as a single type of failure. In a real pipeline system, however, pipe failures caused by deterioration appear in various forms. We therefore classified pipeline failures as ‘B-burst’, which occur in the pipe body, or ‘C-burst’, which occur in pipe-connection parts. The lifetime of a given part is defined as the period from its installation to burst, and in this study it is assumed that the burst is regarded as major damage and the damaged pipeline is replaced immediately. The Weibull deterioration-hazard model is used to address the lifetime of each pipeline, and takes into account the nature of the competition between several types of failure by using a competing deterioration-hazard model. The deterioration of the pipeline is predicted by developing a competing deterioration-hazard model that considers competition between C-burst and B-burst. The proposed competing deterioration-hazard model allows us to determine the probability density of bursts in the pipe body and connection.

The competing hazard model assumes that competing causes of failure are independent of one another and that the incidence of each cause of failure can be analyzed from lifetime data. Such methods have been used in many fields, including medicine, economics, and engineering. The competing hazard model is widely used in accelerated lifetime testing (ALT) to estimate the lifetime distribution of components. Nelson (1990) discussed an analysis of typical competing hazard models for constant stress ALT data. Kim & Bai (2002) reported a competing hazard model that considered only two competing causes of failure by using ALT data.

Because the pipeline systems are underground, system administrators face difficulties due to insufficient amounts of observation data. Thus, the insufficiency of data interrupts the practical application of the statistical model. In order to overcome this problem, in this study, the competing deterioration-hazard model is estimated by a Bayesian technique based on the Metropolis–Hasting method (M-H method), a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult.

## METHOD

### Competing deterioration-hazard model

Competing hazards appear in cases in which two or more events can occur. The main idea of a competing hazard model is that the occurrence of an event of interest has to be taken into account while considering the occurrence of competing events. In pipeline systems, the case in which a pipeline is replaced because of a B-burst can be considered. However, there might be the events that could lead to replacement of the pipeline, such as C-bursts. It is therefore possible to assume that a C-burst is a competing event, if we assume that a B-burst is the event of interest, because a B-burst interrupts the occurrence of a C-burst. To introduce the competing hazard among the pipe failure types, the major damage which requires pipe replacement is considered. In this study we focus on the B-burst and C-burst.

Herein, we classify the state of a pipeline as being one of two condition levels: a ‘healthy condition’ and a ‘burst’ resulting from B-burst or C-burst. It is assumed that the burst denotes a state in which major damage is found and replacement is required immediately. On the other hand, the healthy condition reflects not only a normal operation but a condition where no major damage is found. In addition, it is also assumed that the records of past repaired incidents, leakages or breaks are not considered as burst because these incidents would not be major damage.

The pipeline system must be discretized into pipe segments (Mailhot *et al.* 2000). A pipe segment is defined as the same as a pipeline is a series of pipes with relatively homogeneous characteristics such as pipe diameter and type of material and installation period.

Each pipeline is represented by , and the elapsed time from laying pipeline *i* to the present is expressed as . In addition, we assume that more than one type of pipe burst is possible for pipeline *i*. The life span of pipeline until burst in pipe *i* is expressed by the random variable and this is subject to the probability-density function and the distribution function for each type of burst type *j*. Here, the domain of life span is . In addition, the probability that pipe burst will not occur until time is defined as and is known as the survival probability. This can be expressed as follows:
1
when competing hazards exist, the conditional probability that pipe burst does not occur in pipe *i* until an arbitrary time *t _{i}* and that pipe burst occurs by burst type

*j*during the time span can be represented by the following equation: 2 where is the hazard function for each burst type

*j*and

*TF*denotes type of burst. Note that in this competing hazard model, to obtain the hazard function, the density function should be divided by rather than by . For example, dividing by gives the conditional probability that a C-burst will not occur before an arbitrary time and that a C-burst will occur at . However, in this case the probability that a pipe burst will occur through C-burst would be overestimated, because the occurrence of a pipe B-burst, which is a competing risk of pipe C-burst, is not considered. It is therefore reasonable that the hazard function for C-burst has to be defined as the probability of a C-burst occurring when no pipe burst occurs until the arbitrary time .

The overall survival function can be defined as follows:
3
where is overall hazard function and is defined by . The overall survival function is the probability that any burst type does not occur; it can therefore be represented by the joint probability of the partial survival distribution function for each burst type , as follows:
4
The partial survival function can also be defined as follows:
5
Accordingly, from Equation (2), the partial density function for each type of burst can be expressed as follows:
6
Pipe burst depends largely on the duration of use of the pipeline. The hazard function should therefore consider the elapsed time. In this study, the Weibull hazard model, which is suitable for addressing this process, is applied with the assumption that the probability of pipe burst increases with time, as follows:
7
where is the acceleration parameter that represents the time dependency of the hazard function and is the parameter expressing the arrival rate of pipe burst. It is assumed that depends on the characteristics of the pipeline, and that it can be expressed as follows:
8
where is the characteristic vector that represents the observed value for pipeline *i* and *β*_{j} = represents the unknown parameter vectors. In addition, *k* is the total number of covariates and the sign' denotes transposition. By using the Weibull hazard model, the probability-density function and survival function can be expressed as follows:
9
and
10

### Estimation method

Let us discuss the estimation method for the competing deterioration-hazard model based on inspection data. The time at which the pipe was buried is set as *t* = 0 and *t _{i}* denotes the observed duration of use of pipeline

*i*(

*i*= 1, …,

*n*). If a pipe burst occurs and the life span of the pipeline ends, its duration of use is equal to the life span, . On the other hand, it is assumed that a pipeline for which no burst has been reported until the inspection time still survives. In other words, if a pipeline's life span has not ended, this exceeds its duration of use, . Then, let us introduce the dummy variable , which denotes whether pipe burst has occurred or not: 11

In addition, the reported pipe burst type, in this study, the burst type is classified as either a C-burst or a B-burst , can also be represented by the dummy variable *d _{i}*:
12
The observation information for pipeline

*i*can be represented as follows: . Here, we define the unknown parameter vector for the competing deterioration-hazard model as . The parameters, and denote and , respectively. If we suppose that there is observed information for pipeline

*i*, , the conditional probability that the observed information occurs in pipeline

*i*can be represented by the following equation: 13 This assumes that the pipe burst of each of the

*n*pipelines is mutually independent from that of other parts of the pipeline system. The simultaneous probability density of the pipe deterioration can therefore be expressed by the following likelihood function: 14 where represents . The unknown parameter can be estimated by the maximum likelihood estimation (MLE) method, which provides an estimate of parameter that maximizes the likelihood function.

### Bayesian estimation method for competing deterioration-hazard model

In the MLE method, huge amounts of data are required to secure precision but it is not always possible to accumulate a great number of data, especially in the pipeline systems. Pipeline system administrators face difficulties of insufficient amounts of observation data because pipeline systems are underground. The Bayesian estimation method can provide estimation results by fusing prior information, such as human experience and expert knowledge, with an insufficient amount of observation data (Kobayashi *et al.* 2012). In addition, the Bayesian estimation method is easy in comparison with the MLE method because in the Bayesian estimation method it is not required to derive the Jacobian and Hessian matrices. Furthermore, in the high dimensional nonlinear equation problem, the equation has multiple local optimum solutions. In this case, a poor choice of starting point in the MLE method can cause converging to a local optimum that is not the global optimum, or failure to converge entirely. The competing deterioration-hazard model is a high dimensional nonlinear expression of parameter and the optimization problem may have a large number of solutions including complex valued solutions. Thus, in this case, using the Bayesian estimation method instead of the maximum likelihood estimation method can solve the high dimensional nonlinear multinomial expression. In this section, we present a methodology for estimating the unknown parameter vector of the competing deterioration-hazard model by means of a Bayesian estimation method using observed data.

The Bayesian approach permits the estimation of on the basis of the inspection data and prior information regarding . By using the M-H method, the estimation is carried out by sampling a large number of values of from its posterior distribution, which can be expressed as follows: 15 where is the posterior probability density function of , is the likelihood function, and is the prior probability density function of . The newly obtained data are denoted by . By substituting the Weibull hazard model (9) and (10) into Equation (14), the likelihood function can be expressed as follows: 16 where and .

In this study we assume that the prior probability density function of parameter, and , follow a gamma distribution and a conjugate multidimensional normal distribution, respectively, , ** β** ∼ (

*μ*_{0},

*Σ*_{0}). With this assumption, the probability density function of the gamma distribution function and the

*K*-dimensional normal distribution can be further expressed as follows: 17 and 18 where denotes the gamma function and and represent the prior expectation vector and the prior variance-covariance matrix of , respectively. On the basis of Equation (15), the posterior probability density function is defined as follows:

19

The M-H method is used to perform sampling from an empirical distribution that is similar to and accordingly obtains samples from the original distribution (Kobayashi & Kaito 2012). Furthermore, a random walk is used to improve the efficiency of sampling. The M-H method is described below.

*Step 1. Initial establishment*

The initial value of parameters , the number of iterations for parameter sampling , and the burn-in period are established. In addition the stride of the random walk is set.

*Step 2. Sample extraction for estimation of the parameter*

When the number of simulations is , the parameter estimation is generated as described in Steps 2-1 to 2-3.

*Step 2-1*

The stride of the random walk is assumed to follow a normal distribution with a mean of 0 and a variance of . The new candidate value is then calculated as follows:
20
*Step 2-2*

The acceptance probability is calculated as follows:
21
*Step 2-3*

The uniform distribution *u _{n}* ∼

*U*(0,1) is generated, and then the sample is determined by applying the following condition: 22 If the acceptance probability is greater than , the candidate value is accepted; otherwise, the original value is retained.

*Step 3. Final judgment of the algorithm*

Step 2 is repeated until the number of samplings reaches *N*.

The samples are then accumulated except for those that were generated during the burn-in period. If the number of samples *N* is sufficiently large, the parameters estimated by using the above algorithm will converge on the estimated value of the posterior distribution. Geweke test statistics (Geweke 1992) are used to test whether the sampling process of the M-H method reaches a steady state and the number of samplings *N* is appropriate or not.

## EMPIRICAL STUDY

### Overview of the empirical study

To analyze the deterioration of a real pipeline, we focused on the water distribution system of S city in South Korea. The pipe material, ductile cast iron pipe (DCIP), is regarded as the target for this study. The whole data of DCIPs comprise approximately 26,500 pipelines, 850 km in length, with an average age of around 13 years. Inspection data were obtained from historical records for pipe bursts in S city during the 9-year period 2001–2009. During this period, 1,405 cases of pipe replacement caused by B- and C-burst were recorded. Here, in this study, it is assumed that the replaced pipelines had major damage and its condition state is classified as burst. On the other hand, the historical records of past repair are not considered as burst because a repair is not associated with major damage. Table 1 shows the basic information of the data used in this study.

The inspection data contain information on whether or not pipe burst occurred and the type of burst for each damaged pipeline. In this study, the type of burst is classified as either a B-burst in a pipe body or a C-burst in pipe connections. Accidents that occurred in other subcomponents, such as valves, rubber packings, and so on, are neglected. On the other hand, the pipe diameter and length are used as characteristic information that affects pipe burst. On the basis of this information, the duration of survival before burst of a pipe is expressed by using the Weibull hazard model, and the competing deterioration-hazard model is used to consider the competition between C-bursts and B-bursts in the pipeline. The model is then estimated by using the Bayesian estimation method.

### Estimation results

The Weibull hazard model used for the Bayesian estimation is specified as follows:
23
The unknown parameter is a constant term, and represent the pipe diameter and pipe length, respectively. In this study, other characteristic variables that reflect the influence of outer and inner rust, soil unit weight, top traffic volume, and so on were neglected, either because of their small impacts or because data were unavailable. The unknown parameters can be expressed as follows:
24
We assume that the prior probability density function of the unknown parameters, and follow ** m** ∼

*g*(

*m*

_{0},

*k*

_{0}),

**∼ (**

*β*

*μ*_{0},

*Σ*_{0}). Unfortunately, in this empirical study, because of the absence of detailed substantive knowledge, it is difficult to obtain information about the expectations and the variance of unknown parameters. However, if the number of observed data is large enough, the influence of prior distribution can be ignored. Thus, a non-informative prior distribution is applied for the Bayesian estimation. The non-informative prior distribution can be obtained by setting the variance of the prior distribution to be sufficiently large, as follows: 25

26 where and are sufficiently large integers. and are a zero vector and a unit matrix, respectively, and N is normal distribution.

In order to improve the precision of estimation, the Bayesian updating rule (Kobayashi *et al.* 2012) is used. We created three different data groups (, , ) which are extracted based on original data set. Here, the subscript numbers denote the number of extracted data. The estimation is performed in the order of the small size of the data and the estimation results (the mean, variance and covariance) are used as prior information of the next estimation using the Bayesian updating rule.

To conduct the M-H method, the number of iterations required to reach a steady state (the burn-in period) was set to and the number of iterations for parameter sampling was set to . The 10,000 burn-in samples were omitted and the remaining 10,000 parameter samples were used to carry out the estimation.

Table 2 shows the results of the Bayesian estimation of competing deterioration-hazard models for each of the databases , , and original data set. The estimations obtained by the M-H method show the probability distribution of the parameters. In Table 2 the values estimated by the Bayesian estimation method are the sample average of parameters, and the values in parentheses refer to 95% credible intervals. All the credible intervals of estimated parameters do not contain zero. Because all the 95% confidence intervals do not contain zero, the estimated values will be significant at the 5% level (Wu & Hamada 2009). As shown in Table 2, as the amount of observation data increases the credible intervals become narrower. The absolute value of the Geweke test statistics shown in italic type are all less than 1.96, so the convergent hypothesis cannot be dismissed at a significance level of 5%.

Figure 1 shows the posterior densities of model parameters for the database. As Figure 1 suggests, the estimation was conducted with high confidence because the shapes of distribution for most of the posterior parameters show normal distribution.

With the estimation results for the competing deterioration-hazard model, it is possible to formulate the survival probability for each type of pipe burst: C-burst or B-burst. Figures 2 and 3 show the survival probability of DCIP for each of the databases to B-burst and C-burst, respectively. The survival probability curves of the Bayesian mean estimates are shown. In addition, Figures 2 and 3 show that as the amount of observation data increases and Bayesian updating is conducted, the survival probability curves approach the survival probability curves obtained from the original database. As shown in Figures 2 and 3, the survival probability curve obtained from the database shows almost the same path compared to the survival probability curve obtained from the original data set. It means that the Bayesian updating rule improves the efficiency of model estimation and data acquisition.

Figures 2 and 3 also show that the survival probabilities for both C-burst and B-burst decrease over time and that the survival probability for C-burst decreases more rapidly than that for B-burst. In other words, in a ductile cast-iron pipe, bursts in pipe connections (C-bursts) occur at a higher rate than bursts in the pipe body (B-bursts).

Figure 4 compares the survival probabilities of C-burst and B-burst obtained from the Bayesian mean estimates of competing deterioration-hazard model and the conventional Weibull deterioration-hazard model. Figure 4 also shows that the competing deterioration-hazard model predicts a higher survival probability than does the conventional Weibull deterioration-hazard model. It is noteworthy that the reason why the competing deterioration-hazard model predicts a slower deterioration is that this model considers the occurrence of a competing event when the probability of the event of interest is sought.

## CONCLUSIONS

A pipe deterioration model is important for the asset management of pipeline systems. Pipe failures caused by deterioration appear in various forms. Thus, a deterioration forecasting model which considers failure types enables the establishment of an efficient rehabilitation strategy. We have developed a competing deterioration-hazard model that considers competition among several types of burst in pipeline systems and the proposed model allows us to determine the probability of burst for each type of burst. The competing deterioration-hazard model is estimated using Bayesian estimation method.

The empirical study was carried out by using an inspection data set of a real pipeline system. In the empirical study, because of the absence of detailed substantive knowledge, the competing deterioration-hazard model is estimated with non-informative prior distribution and the Bayesian updating rule is used to improve the precision of estimation. The results show that the more estimation results are updated, the more precise estimation results can be obtained. The estimation results obtained from the database show almost the same results obtained from the original data set. This result indicates that the Bayesian updating rule improves the efficiency of model estimation and data acquisition. In addition, in this study, although we used non-informative prior distribution because of the absence of detailed substantive knowledge, if we can accumulate prior information, the proposed method would be a good way forward.

According to the results of the occurrence probability prediction of C- and B-burst obtained by competing deterioration-hazard model, more care is necessary for the pipe connection because the probability of pipe burst in a pipe connection (C-burst) is higher than that in pipe bodies (B-burst). In addition, the results show that the conventional Weibull deterioration-hazard model, which does not consider competing properties, overestimates pipe burst rates. The bias which arises between the competing deterioration-hazard model and the conventional Weibull deterioration-hazard model comes from the feature of competing hazard model that considers the occurrence of a competing event when the probability of the event of interest is sought. Even though the prediction accuracy of the competing deterioration-hazard model is slightly high in comparison with the conventional Weibull deterioration-hazard model, it is noteworthy that there is the potential for improvement. The proposed competing deterioration-hazard model can be improved further if we could overcome some problems such as left truncated data and high percentage of right censored data, which were not considered in this study. In addition, it is required that much more observed data set and empirical studies are accumulated.

In this study, we classified the pipe burst type into B-burst which occurred in the pipe body and C-burst which occurred in the pipe connection. As shown in the results, we were able to see that the C-burst and B-burst had different deterioration rates. Because the choice of a maintenance and repair method will depend on the type of burst, the competing deterioration-hazard model enables us to establish an optimum maintenance strategy for the pipeline system. In addition, we believe that our new model can be extended to other items of infrastructure and will contribute to advancing asset management.

Our proposed model has not discussed the following points, which are considered for a future extension of our study:

(1) In this paper, most of the failures that typically affect real pipeline systems (i.e. pipe breaks, leakages, etc.) are disregarded. To establish optimal maintenance strategy, it is important to consider the repairs due to breaks or leakages.

(2) The limited and missing information, left-truncated or survival selection, which are often embedded in observed data, have not been mentioned.

(3) Supposedly, considering competing hazards would be more relevant if more than two competing hazards exist. This could be explored by using synthetic data and considering different amounts of competing failure types.

- First received 26 January 2015.
- Accepted in revised form 29 June 2015.

- © IWA Publishing 2016

Sign-up for alerts