## Abstract

Water demand is the driving force behind hydraulic dynamics in water distribution systems. Consequently, it is crucial to accurately estimate the actual water use to develop reliable simulation models. In this study, copula-based multivariate analysis was proposed and used for demand prediction for a given return period. The analysis was applied to water consumption data collected in the water distribution network of Palermo (Italy). The approach produced consistent demand patterns and could be a powerful tool when coupled with water distribution network models for design or analysis problems. The results were compared with those obtained using a classical water demand model, the Poisson rectangular pulse (PRP) model. The multivariate consumption data statistical analysis results were always higher than those of the PRP model but the copula-based method maintained the daily water volume of actual consumptions and provided maximum daily consumption that increased with the return period.

- multivariate analysis
- Poisson rectangular pulse model
- vine copula
- water demand modelling

## INTRODUCTION

Urban development has created new water distribution system management problems. Water supply networks are required to respond to growing drinking water demands, which are highly variable in terms of space and time. The general goal for any water utility is to constantly supply good quality water under sufficient pressure to all customers (Zhou *et al.* 2002; Herrera *et al.* 2010). The reliability of a utility's water distribution system depends on a combination of the following factors that play important roles in the system design and management: the water demand variability, size and maintenance of pipes, and volumes of urban reservoirs. The development of powerful computers has enabled hydraulic engineers to simulate the behaviour of water supply systems for most scenarios. However, the accurate prediction of pressures, flows and water quality parameters depends strongly on the quality of the input data. The data required to simulate the behaviour of a water distribution network, such as pipe friction coefficients and nodal demands, along with their temporal variations, contain uncertainty, which consequently affects our confidence in the simulation outcome. There has been agreement in the literature that uncertainties in nodal demands and their variation with time is one of the primary sources of error responsible for discrepancies between measured and model-simulated flows and pressures.

Residential water demand is one of the most difficult parameters to determine when modelling drinking water distribution networks. The simulation of a water supply system is often performed by assuming averaged values (in space and in time) of the water demands. The average spatial values are obtained by clustering the water consumption of users towards each node of the network, and the time-averaged values are obtained as the mean of the instantaneous values of the nodal demands. The simulation results obtained using these simplifications may not be reliable for the hydraulically disadvantageous zones of the network. Therefore, water demand modelling has been an active field of study. Researchers have been primarily interested in household domestic water consumption, which is the principal rate of the total volume supplied by the water distribution system in urban areas, often equal to 75%. Water demand can be predicted on different time scales. Short- and long-term forecasting of the municipal water demand is essential to water utilities for system planning, design and asset management. Short-term forecasting is useful for operating and managing existing water supply systems within a specific period, whereas long-term forecasting is particularly important for system planning and design.

Analysing and forecasting urban water demand is a complex but imperative task. Unfortunately, accurately predicting demands and simulating the short- and long-term pattern of demands are difficult propositions for two reasons. First, it is impossible to predict the extent to which factors that influence the water demand (e.g., household income, urban population, number of service connections, water price and land uses) will change in the future. Peak daily use is estimated by applying peak factors to the average daily level, and design and expansion decisions are based on this value. The second consideration that complicates water demand simulation is its variability in time. By studying a hydrograph of the daily water demand in a system, it can be recognised nearly immediately that changes in water demand occur at many different time scales. For example, domestic water use typically changes rapidly, nearly in a matter of seconds and minutes, as people perform their domestic tasks of washing, cooking, cleaning, bathing, etc.

A detailed model of the hydraulic behaviour of drinking water distribution systems could be obtained by implementing a domestic water demand model in one of several software programs that have been recently developed. To date, stochastic models for instantaneous residential water demand (e.g., impulse models, see the subsequent sections for references) have been used to obtain realistic demand patterns for the hydraulic distribution network solvers. Several basic parameters related to the residential water usage are required to apply these models, such as the frequency, duration and intensity statistics of a single consumption event. This methodology, which maintains its validity, reveals some limitations, in our opinion. Impulse models (e.g., the Poisson rectangular pulse (PRP) model) can reliably simulate continuous patterns of water demand for a specific user only when continuous series of their recorded consumption are available to calibrate them. The reliability of the simulated demand is thus strictly dependent on the amount of available data. These models demonstrate good performance, as well as arising from hydrology to define rainfall series, even if the generated demand patterns will have the same temporal and spatial aggregation as the recorded patterns. If the temporal aggregation of available data is too coarse, model parameters (strictly related to single consumption pulses) lose their physical sense.

In actual water distribution network management, it is unlikely that continuous series of recorded consumption data are available for more than a few users. Furthermore, performing this kind of monitoring campaign at large scale is usually hardly viable by means of common systems of automatic meter reading (mainly due to the high costs of data transmission and management). The large amount of data that should be stored and transferred to a central database require high effort both in terms of memory storage and energy consumption which greatly reduces the battery life of the monitoring equipment.

As reported in Alvisi *et al.* (2007), the water demand process is not entirely stochastic, but a rate can be linked to deterministic relationships between the variables characterising the process: seasonal and weekly periodicities in daily consumption and daily periodicities in hourly water demand patterns can usually be identified. Impulse models neglect the statistical dependence of the parameters characterising the consumption process considering pulse intensity, duration and frequency as independent variables. Nevertheless, the statistical proprieties of water demand (e.g., the mean, total daily volume, maximum daily value) are few, easily monitored and processed. Therefore, a statistical methodology able to assess and analyse the multivariate relationships between these statistical properties involved in the demand process could lead to the reduction of the amount of data useful to describe the water consumption process. As an advantage, the management of the monitoring equipment at large scale could be feasible and cost effective.

In the present study, a multivariate statistical method based on the copula functions recently introduced in hydrology and proposed by the authors in Fontanazza *et al.* (2014) has been further developed. The main peculiarity of the copula function is that it permits separate investigation of the marginal properties and interdependence structures of the variables. It synthesises the dependence structure of the variables in the purest and most essential form (Bárdossy & Pegram 2009) without assuming that the variables are normal or have the same marginal distribution.

Starting from the above-mentioned considerations, this study has the following two objectives: (1) to propose a procedure based on a multivariate statistical analysis of the primary features of the water consumption process at the domestic level and (2) to obtain a practical use water consumption model that is not characterised by high resolution data requirements. The proposed method was validated by comparing its results with those obtained using the PRP model. This paper is organised as follows. The multivariate and PRP analyses of the consumption process are described next. Then, the case study to which the procedure is applied is presented. This is followed by presentation and comparison of the resulting demand patterns, according to the multivariate analysis and the PRP model. Finally, the study conclusions are presented.

## MATERIALS AND METHODS

### Multivariate consumption data statistical analysis based on copula function

The copula function is a new multivariate statistical analysis method that is frequently used in hydrology to perform multivariate process simulations, extreme value analysis and dependence structure modelling (De Michele & Salvadori 2003; Favre *et al.* 2004; Salvadori & De Michele 2004a, b; Grimaldi & Serinaldi 2006; Bárdossy & Pegram 2009). While there is a multitude of bivariate copula, building higher dimensional copulas is generally recognised as a difficult problem.

The idea of constructing a multivariate dependence model from bivariate copulas as building blocks (i.e., pair-copulas) dates back to Joe (1996). The author detailed the construction of the first pair-copula in terms of distribution functions. Bedford & Cooke (2001, 2002) realised that there were a significant number of possible pair-copulas constructions (PCC); thus, they organised them in a graphical manner by sequentially designing trees that identify the bivariate copula densities needed to make up a d-dimensional density. It involves only the products of bivariate copulas. As the trees are intrinsically related, the authors called these distributions regular vines (R-vines). Vine copulas are flexible functions for multivariate dependencies, which specify a factorisation of the copula density into a product of conditional bivariate copulas. The class of regular vines is still general, and it embraces a large number of possible pair-copula decompositions (Aas *et al.* 2009), including two simple tree structures, such as line trees and star trees. The former corresponds to D-vines, while the latter corresponds to C-vines. Recently, vine copula functions have demonstrated great potential in several hydrological applications (Graler *et al.* 2013).

Several procedures analysing and modelling user water consumption, such as the Poisson, Neyman-Scott and Bartlett-Lewis models, have been widely adopted in the literature. These procedures have been primarily used to study the stochastic process inherent in rainfall events (Rodriguez-Iturbe 1986; Buchberger & Wu (1995); Cowpertwait *et al.* 1996; Alvisi *et al.* 2003; Alcocer-Yamanaka *et al.* 2012). Similarly, the present paper proposes to model the daily pattern for user water consumption through a multivariate statistical approach first used to generate synthetic rainfalls by employing copula functions that simulated the multivariate relationships between the main variables characterising the temporal rainfall pattern, such as the rainfall total depth, duration and maximum intensity (Fontanazza *et al.* 2011).

According to this procedure, the multivariate consumption data statistical analysis (MCDSA) uses 3-d vine copulas for the main variables characterising the daily pattern of domestic water consumption: *V _{d}*, the total daily volume consumed by the user, which is analogous to the total rainfall volume;

*K*, the daily peak coefficient, which is expressed as the ratio between the maximum consumption in a given time step,

_{p}*V*, and the total daily volume,

_{max}*V*, which is analogous to the maximum rainfall intensity; and the time to peak coefficient,

_{d}*T*/24, which is expressed as the ratio between the time to peak and the total duration of the demand pattern, 24 hours. Figure 1 shows a flowchart of the MCDSA used in the present study.

_{p}First, prior to performing the procedure, a triplet is assessed for each daily pattern recorded for a monitored dwelling. Namely, MCDSA requires a sample of triplets at least 1-year long to ensure reliability. Thus, the related dimensionless transformed variables , which are approximately uniformly distributed in [0, 1], are evaluated together with the marginal distribution functions , and (step 1).

Second, the Kendall's τ* _{k}* rank correlation of each couple of variables (i.e.,

*V*-

*K*,

*V*-

*T*and

*K*-

*T*) is evaluated to estimate the statistical dependence between the variables. Hence, several 3-d vine copulas are built to fit the sample of the variables . In the three-dimensional case, there are no differences between a C- or a D-vine, only the ordering of variables can be changed. Figure 2 shows the possible schemes for composing a 3-d vine copula. In the second tree, the two conditional cumulative distribution function (CDF) values are calculated for all triplets . These ‘conditioned observations’, which are again approximately uniformly distributed in [0, 1], are then used to fit another bivariate copula (e.g.,

*C*

_{KT|V},

*C*

_{VK|T}or

*C*

_{VT|K}). Considering the 3-d vine structure shown in Figure 2(a), the full density function of the three-dimensional copula is given as follows: 1 The three-dimensional distribution function of the original variables sample is obtained by combining the bivariate copulas, as shown in Equation (1), and substituting the marginal distribution functions , and , which are defined as , and . Therefore, the full density function f

_{VdKpTp}is given as follows:

2 where , and , represent the marginal density functions of .

According to Equations (1) and (2), three bivariate copulas must be fitted to derive the building blocks of a 3-d vine copula (e.g., in Figure 2(a), the *C _{KV}*,

*C*and

_{TV}*C*

_{KT|V}bivariate copulas). The maximum likelihood estimation (MLE) method can be adopted to fit a copula from each family investigated for each pair of variables; the copula showing the highest log-likelihood value or the minimum Akaike's information criterion (AIC) value is selected as the best fitting vine copula model for the analysed data set. The methodology considers all possible schemes of the 3-d vine copula shown in Figure 2.

Using the best fitting 3-d vine copula model, which is identified in step 3, a sample of synthetic triplets related to a given return period was generated (step 4). The multivariate return period of the triplets was assessed following the procedure proposed by Salvadori *et al.* (2011) and based on the copula's Kendall distribution function *K _{C}(t).* According to this procedure, the copula-based return period

*T*(expressed in days) is provided as follows: 3 where μ is the mean inter-arrival time expressed in days (in the case of a daily event, μ = 1), and

_{KEN3}*K*is the copula's Kendall distribution function. Namely,

_{C}(t)*K*→

_{C}(t): I*I*is defined as: 4 with

*t*∈

*I*defined as the probability level.

Thanks to Equation (4), after fixing the design return period *T _{KEN3}*, the corresponding probability level

*t*was assessed using the inverse of the copula's Kendall distribution function

_{KEN3}*K*In 3-d, this level corresponds to an iso-surface, i.e., all triplets on this surface have the same copula value equal to

_{C}(t).*t*.

_{KEN3}*K*allows the calculation of the probability that a random point in the unit cubic space has a smaller or larger copula value than a given critical probability level (

_{C}(t)*t*

*=*

*t*). The Kendall distribution function is a univariate representation of multivariate information because it is the CDF of the copula's iso-surface. Therefore,

_{KEN3}*Kc(t)*is an essential tool for calculating a copula-based return period for multivariate events (Graler

*et al.*2013).

In step 5 of the analysis, a pattern, derived from historical time series of consumption, was assigned to each synthetic triplet using a similarity criterion. Namely, for each recorded daily consumption pattern, at first, the related mass curve (Huff curve), defined as the representation of the normalised time vs. the normalised cumulative water consumption from the beginning of the day, was evaluated. Then, each statistical triplet was linked to the Huff curve of the historical event that minimised the following objective function (Fontanazza *et al.* 2011):
5
where *w _{V}*,

*w*and

_{K}*w*are weights with a sum equal to 1. They were evaluated using a Monte Carlo analysis aimed at maximising the modelling efficiency (valued by means of the Nash and Sutcliff method) between the 50th percentile of the synthetic and recorded patterns.

_{T}Finally, (as step 6) the synthetic patterns obtained for a given return period were statistically processed to estimate the related percentiles.

### PRP model for consumption data analysis

Qi & Chang (2011), House-Peters & Chang (2011) and Donkor *et al.* (2012) have presented an overview of water demand prediction models on various time scales. The time scale is dictated by the purpose for which the prediction model is to be used (Bakker *et al.* 2013). Most past water consumption studies have started based on the requirements of quantifying global demand using long-term forecasting (Maidment *et al.* 1986; Dziegielewski & Boland 1989; Miaou 1990; Zhou *et al.* 2000) and establishing a suitable rate structure (Rothstein 1992). New reasons to better characterise the domestic water consumption have recently emerged; among these reasons, the requirement to ensure water volumes demanded by customers and to supply them with sufficient pressure and good quality has been prominent (Clark *et al.* 1993; Buchberger & Wu 1995; Buchberger & Wells 1996; Guercio *et al.* 2001).

At a domestic service level, water demand is considered sporadic and is characterised by sudden demand pulses, and it tends to have a stochastic character (Buchberger & Wu 1995; Buchberger & Wells 1996), particularly when considering time scales on the order of seconds. The sporadic water demand can be characterised as a series of rectangular pulses with a set intensity, duration and frequency. Therefore, several stochastic models for domestic demand determination have been developed. These models include the PRP model (for reference, see the following) and the Neyman-Scott rectangular pulse model (Alvisi *et al.* 2003; Alcocer-Yamanaka *et al.* 2012), among others (Blokker *et al.* 2010).

The PRP model was initially proposed and applied to describe rainfall temporal patterns and to generate a synthetic series representing rainfall or storm events, according to the projected interval and duration (Rodriguez-Iturbe *et al.* 1984, 1987; Rodriguez-Iturbe 1986; Cowpertwait *et al.* 1996). Subsequently, the PRP model has been transferred to water demand modelling and has provided good results (Buchberger & Wu 1995; Buchberger & Wells 1996; Guercio *et al.* 2001; Buchberger *et al.* 2003; Freni *et al.* 2004; García *et al.* 2004), thereby confirming the adaptability of these models to describe and well-reproduce processes that are strongly time dependent and that arise from aggregation and the superposition of single events. The superposition is caused by each dwelling that takes water as a single pulse from the distribution network, independently of the other dwellings. Furthermore, Magini *et al.* (2008) and Vertommen *et al.* (2014) showed that the spatial and temporal aggregation strongly influences the water demand statistics and represents a key point for a correct stochastic model of water consumption.

The demand events are generally of short duration, followed by relatively long periods of no demand. When a user logs on to the water service, one or more appliances may be maintained as busy. Each water demand event may be composed of a random number of rectangular demand pulses with a height that represents the intensity and a width that represents the duration. The superposition of single pulses causes a complex water demand event. The Poisson process characteristics make it improbable that two demand pulses start at the same moment. However, two pulses that start at different moments are partially superimposed due to the finite duration of each pulse. The intensity of the water consumption composed of more than one pulse is the sum of the intensities of the single pulses.

The basic parameters of the PRP model are the frequency of occurrence of the individual pulses (i.e., the number of pulses that occur in the selected time step) and the intensity and duration of the pulse. The pulse origins are independently displaced from the event origin. The duration and intensity of the pulses are assumed to be independent random variables. This assumption is expedient for the sake of operational efficiency and simplicity. The frequency, intensity and duration are formally described by their statistical properties of mean, variance and probability distribution. Since the water demand presents certain natural variations during the day, the arrival rate changes during the day (i.e., the Poisson process is non-homogeneous).

In this paper, the parameters of the PRP model are obtained by registering the instantaneous water demand every second using special equipment, separating the individual demand pulses, and statistically processing the resulting series of pulses. This approach, based on the direct measurement of the demand pulses, becomes costly because of the special equipment required to register the water demand, the required personnel for related field work, and the enormous amount of data to be processed.

A single elementary use is rectangular-shaped, and it is described by a random duration and a random steady intensity. More complex water demands that are caused by the superposition of multiple single pulses must be converted into more regular events, which are defined as single equivalent rectangular pulses (SERPs) by Buchberger *et al.* (2003). Rectangular-shaped ideal water consumption is described by the moment at which it begins and ends, constant and positive intensity and duration.

The recorded single-user water consumptions always show a wide variety of patterns that typically differ from the rectangular-shaped patterns, with an intensity fluctuation caused by network pressure variability and unsteady events due to the opening and closing of appliances. Each event was converted into one or more SERP to analyse the characteristics of the consumption process and to estimate the parameters of the PRP model. This process involved two steps: signal smoothing and pulse separation. Each complex event was converted into constant rectangular-shaped signals by applying a signal smoothing based on the analysis of the difference between the progressive average (from the event beginning) and a five-point moving average. Each intensity variation for which the difference between the progressive average and the five-point moving average is higher than a set threshold value (changing with the duration and the total volume of each event) was considered to be significant and, thus, not negligible. Previous intensities were averaged to yield a constant intensity. This procedure was performed many times inside each event and for all events. The result was rectangular-shaped and more complex signals, defined as strings of random blocks (SORBs) by Buchberger *et al.* (2003), with more than one constant intensity. SORBs show that many appliances are contemporarily busy. In the pulse separation phase, SORBs were analysed to identify the actual superposition of single uses. The identification and separation of single pulses are based on two empirical rules: two pulses do not begin or end simultaneously, and the superposition of pulses causes an increase in intensity. A system of ‘n’ linear equations, with unknown values that are the mean intensities of the SERPs, was written for each SORB (where ‘n’ is the system dimension equal to the amount of positive change in intensity). The system solution provided the intensities, durations, beginning and ending of each single pulse. When the solution was not reached, the SORB was replaced by a single equivalent pulse with an intensity equal to an average value weighted by the duration of the different intensity levels.

The evaluation of the PRP model parameters depends on the aggregation time step for which the stochastic process may be considered to be homogeneous. In this study, the consumption process was assumed to be homogeneous at the daily scale for the intensity and duration and homogeneous at the hourly scale for the pulse occurrence (frequency). The daily consumption pattern appears to be primarily due to the occurrence of single events rather than their duration and intensity. This hypothesis reduced the number of parameters to evaluate for stochastic model identification.

## THE CASE STUDY

The above-described MCDSA and PRP models were applied to water consumption data obtained from monitoring eight dwellings located in Palermo (Italy) throughout 2007 (Figure 3).

The customers that participated in the consumption monitoring program were selected according to the following characteristics: families with at least two members; family members ranging in age from 4 to 70 years; a minimum of one electric household appliance (dishwasher or washing machine); negligible outdoor consumptions; and cooperation. The selected eight families were the only families that agreed to participate in the consumption monitoring program. Instrument packs to monitor domestic water use were installed on the service line of each of the eight dwellings, downstream of the revenue water meter (Figure 4).

The instrument package included a multi-jet water meter and a data logger. The two devices were coupled by means of an impulse sensor. The water meter had a minimum flowrate, Q_{1}, equal to 15 l/h, a transitional flowrate, Q_{2}, equal to 22.5 l/h, a permanent flowrate, Q_{3}, equal to 1,500 l/h, and an overload flowrate, Q_{4}, equal to 5,000 l/h (ISO 4064-1 2005). When the cumulative volume consumed by the user was equal to or higher than 0.5 l in a given time period, the sensor transmitted a signal to the data logger for each 0.5 l; when the volume was lower than 0.5 l, the sensor did not transmit a signal but the volume was aggregated to the following consumption pulse until the cumulative volume was equal to 0.5 l. Considering that a common faucet is characterised by flows in the range 6–12 l/min, the sensor was able to transmit pulses longer or equal to 5 seconds (in the worst case) or equal to 2.5 seconds (in the best case). Cumulative volumes of more than 0.5 l were recorded in a text file containing six fields (i.e., day, month, year, hour, minute and second). The water demands were downloaded by connecting the data logger to a portable computer.

The monitoring period was approximately 1 year for five dwellings, shorter for dwellings 4 and 5, and longer (by two times) for dwelling 6 (Table 1).

A four-step process was used to transform the raw input signals into archived residential water consumptions (Figure 5). Step 1 involved data retrieval; step 2 involved data correction (removal of repetitive signals and placing the water demand readings in chronological order) and water use separation; in step 3, leaks and ultra-low demands were censored; and in step 4, the volume of each pulse was uniformly distributed over the duration of the pulse. A sparse matrix collected the flow values (l/sec) using the seconds in a day as the number of columns and the days during which the consumption data were recorded (changing for each user) as the number of rows.

## RESULTS AND DISCUSSION

### MCDSA application to the case study

For each monitored dwelling, a triplet was assigned to each recorded daily water demand pattern, thus obtaining three samples of variables, with more than 365 data points for each one. Then, the related dimensionless transformed variables , approximately uniformly distributed in [0, 1], were evaluated together with the marginal distribution functions . These functions were identified by fitting several distribution functions to the empirical CDF of and by performing a Kolmogorov–Smirnov (K-S) goodness-of-fit test to choose the best distribution. Table 2 shows the parameters of the obtained marginal distributions for all eight monitored dwellings. For dwelling 1, all three variables (*V*, *K* and *T*) show a good fit with the generalised extreme value (GEV) distribution. The GEV marginal distributions of the variables *K* and *T* were cut off in [0, 1].

To estimate the statistical dependence between the three variables , Kendall's τ* _{k}* rank correlation of each pairwise

*V*-

*K*,

*V*-

*T*and

*K*-

*T*was evaluated (Table 3). For dwelling 1, the pairwise

*V*-

*K*and

*V*-

*T*showed a negative correlation, with Kendall's τ

*values equal to −0.42 and −0.12, respectively; only the*

_{k}*K*-

*T*pair had a positive correlation, with τ

*equal to 0.07. For the other seven dwellings, the correlation was always negative for*

_{k}*V*-

*K*and positive for

*V*-

*T*. Furthermore, the correlation was higher for

*V*-

*K*and

*V*-

*T*and lower for

*K*-

*T*.

Then, the MLE method was adopted to fit several 3-d vine copulas to the variables related to each monitored dwelling. All possible schemes of the 3-d vine copula shown in Figure 2 were built. The investigated copula families include Normal, Student, Gumbel, Frank, Clayton, BB1, BB6, BB7, BB8 and their rotated version. The best fitting 3-d vine copula model for the analysed data set was the 3-d vine copula showing the highest log-likelihood value or the minimum AIC values. Table 4 shows the bivariate copula families, parameters and Kendall's τ* _{k}* of the building blocks of the 3-d vine copula built for dwelling 1. The 3-d vine copula structure best fitting the analysed data is the C-vine (Figure 2(a)). In the Appendix (available in the online version of this paper), the corresponding tables related to the other seven monitored dwellings are shown.

After identifying the best fitting 3-d vine copula model, the analysis focused on the generation of synthetic triplets related to a given return period.

Two return period were set, *T _{KEN3}* = 100 days and

*T*= 365 days, the typical value for water distribution system management.

_{KEN3}A numerical evaluation based on a sample of 2 × 10^{7} points simulated by the best fitting 3-d vine copula was performed to calculate the inverse of *Kc(t)*, as no closed form exists for the CDF of the 3-d vine copula identified in this analysis (for more details, see Salvadori *et al.* 2011). According to Equation (4) and the numerical evaluation of *Kc(t)*, the related *t _{KEN3}* values were calculated, and the results equalled 0.99 and 0.997, respectively. Thus, 1,000 triplets

*(V, K, R)*were sampled from the iso-surface related to , and the corresponding 1,000 triplets with iso-probability were obtained through the inverse marginal distribution. As the final step of the analysis, a pattern was statistically assigned to each triplet after considering the historical time series of consumption. Namely, the Huff curve of the recorded daily consumption pattern that minimised the objective function of Equation (5) (Fontanazza

_{copula}*et al.*2011) was assigned to each statistical triplet .

The 1,000 patterns were finally processed, and the percentiles for a given return period were estimated. Figure 6 shows the 50th and the 99th percentiles of the recorded and generated daily patterns for dwelling 1.

The curves for the 50th percentile of the recorded data and simulated consumption for the two return periods were similarly shaped: the peaks were preserved in the beginning of the morning, thus confirming that this user is typical of working families who are not often at home during the afternoon. The MCDSA provided good results, considering both the total daily volume, *V _{d}*, consumed by the user and the maximum daily consumption,

*V*, that increased with the return period. The median of the total daily recorded volume was equal to 200 l, and the total daily volume simulated for a return period of 100 and 365 days was equal to 230 and 291 l, respectively. The maximum recorded water consumption was 44.67 l/h, and the maximum water consumption simulated for 100 and 365 days was 37.78 l/h and 56.75 l/h, respectively. The simulated pattern for

_{max}*T*= 365 days was higher than that for

_{KEN3}*T*= 100 days.

_{KEN3}The percentiles were consistent, demonstrating that the performed analysis can be efficiently used for water distribution network simulation. Even the curve for the 99th percentile of the recorded daily pattern, which is usually chosen for designing water supply loads to urban buildings, showed an acceptable agreement with the corresponding percentile of the simulated patterns.

### PRP model for the water demand simulation

The empirical CDF of intensity, duration and pulse frequency and the best fitting probability distribution function of the parameters of the PRP model, intensity, duration and pulse frequency were evaluated for each consumption time series. The intensity shows a good fit with a normal distribution that was cut off in [0, 1], as demonstrated by Guercio *et al.* (2001), as does the duration with a log-normal distribution (Table 5) and the frequency with a Poisson distribution, as stated by the hypotheses of the PRP model.

The mean and the standard deviation of the duration shown in Table 5 are comparable with those reported in Buchberger & Wells (1996) and Buchberger *et al.* (2003) which related that the duration of residential water demand showed a good fit with log-normal distribution as well (Table 6). The mean of the intensity in Table 6 (that as well as the duration showed a good fit with log-normal distribution) was twice those in Table 5, and the standard deviation of the intensity in Table 6 exceeds those in Table 5 by a factor 2–4.

As previously mentioned, the Poisson process is non-homogeneous and results in 24 distribution functions of the consumption frequency, one for each hour (Table 7), because the frequency with which customers log on to the water service is time dependent.

The synthetic series of water consumption was generated from the distribution functions of intensity (normal distribution), duration (log-normal distribution) and hourly frequency (Poisson distribution). A random probability was generated for each hour in a day, and the corresponding value of events/hour was defined using the distribution function of the frequency related to that hour. Two additional random probabilities were generated for each event along with the corresponding values of intensity and duration that arose from the distribution functions. The total water volume consumed by the user in that hour is the sum of the volume of each event (intensity by duration) that occurred simultaneously. Coupling this procedure with the Monte Carlo analysis, 100,000 synthetic water consumption series (with a length equal to those recorded) were generated, and the average water consumption was calculated. Figure 7 shows that the average hourly synthetic consumption preserves the shape and volume of the recorded consumption. The comparison of the two curves shows that the PRP model is good at preserving the average demand value, and it is promising for the applications in real time management. To evaluate the forecasting ability of the PRP model, in the next section, the simulated patterns for different return periods are analysed.

### MCDSA vs. PRP model for water demand simulation

Figures 8 and 9 show the 50th percentiles (all monitored dwellings) of the water demand simulated by the PRP model and MCDSA for the two return periods chosen (100 and 365 days, respectively). In order to analyse the reliability of the generated patterns, the 50th percentiles of the recorded consumptions are plotted in Figures 8 and 9 too; it must be highlighted that the median of the recorded data is strictly dependent on the length of the monitored consumption time series and cannot be linked to a specific return period. As shown in Table 1, this length is different for the eight monitored dwellings.

Regarding MCDSA model results, Figures 8 and 9 show that the time to peak (expressed by the coefficient *K _{p}*) was preserved for all dwellings and for the two return periods chosen.

For a return period equal to 100 days (Figure 8), making appropriate distinctions between the dwellings, the 50th percentiles of the MCDSA daily patterns reproduced quite accurately the median of the recorded daily patterns for six dwellings (1, 2, 3, 4, 5 and 8) while less accurately for dwellings 6 and 7. Namely, for dwellings 1, 4, 5, 6 and 8, the total daily volume, *V _{d}*, showed quite acceptable values while for dwellings 2, 3 and 7, the related

*V*values were much higher with relative error with respect to measured percentile, Δ

_{d}*V*equal to −68%, −147% and −246%, respectively. The

_{d}/V_{d}*V*characterising the generated pattern was acceptable for dwellings 1, 2, 3, 4 and 8; for the other dwellings (5, 6 and 7)

_{max}*V*values were always higher than the recorded ones with a relative error, Δ

_{max}*V*, equal to −54%, −65% and −152%, respectively (Table 8).

_{max}/V_{max}For a return period equal to 365 days the *V _{d}* and

*V*were higher than those for the shorter return period, as predicted. However, the daily patterns were well reproduced, except for dwellings 3, 6 and 7.

_{max}Summarising for the two return periods chosen, the 50th percentiles of the daily patterns generated by the MCDSA model usually tended to overestimate the consumptions with respect to the median of the recorded daily patterns (Table 8). The main reason can be found in the structure of the MCDSA model: the t_{KEN3} corresponding to the return periods chosen were high (0.99 and 0.997 for 100 and 365 days, respectively), and the triplets statistically sampled were characterised by infrequent values of the variables. As a result, the MCDSA model tended to overestimate water consumptions generating patterns characterised by high values of *V _{d}* and

*V*.

_{max}In order to explain the different behaviour of dwellings 3, 6 and 7, some factors can be considered.

With regard to dwelling 6, the MCDSA model was not able to preserve *V _{max}*, probably due to the particular shape of the monitored daily patterns where more than one main peak of consumption occurred; dwelling 6 was the only dwelling that demonstrated secondary peaks compared to the main peak. Moreover, the length of the recorded data time series for dwelling 6 was longer than those of the other dwellings lasting approximately 2 years (Table 1). As a result,

*V*was higher than the recorded one.

_{max}With regard to dwelling 3, the MCDSA model overestimated the consumptions between 9 a.m. and 5 p.m., and for dwelling 7, the MCDSA model was not able to accurately forecast the consumption both in terms of *V _{d}* and

*V*. The main reasons are that the median daily pattern of dwellings 3 and 7 showed no demand for several hours in the day (e.g., for dwelling 7 between 12 a.m. and 5 a.m., between 11 a.m. and 5 p.m., and after 9 p.m.), and all recorded patterns showed the same shape (and Huff curve), without any variation (e.g., weekday and weekend). Furthermore, as previously stated, the triplets statistically sampled for the two return periods chosen were characterised by infrequent values of the variables (with

_{max}*t*values equal to 0.99 and 0.997 for 100 and 365 days, respectively). As result, the

_{KEN3}*V*and

_{d}*V*were higher than the median of the recorded consumptions, and the generated pattern tended to overestimate water consumptions.

_{max}The procedure to define the hourly demand patterns and the percentiles for the given return period with the PRP model is similar to that previously described. Monte Carlo analysis was used to generate 100,000 series of 1,000 days of synthetic water consumption, from which the daily pattern water demand was derived for each set return period.

In the PRP model, the demand patterns showed the same shape (i.e., that of the historical consumption time series) for the two return periods: the 50th percentile of the simulated pattern was always lower than that of recorded consumption for the 100-day return period and was close to the median of the historical series for the 365-year return period (Table 8). The only exception was dwelling 6, for which the simulated pattern well reproduced the recorded pattern for the 100-day return period and was higher than the same curve for the 365-day return period (Table 8).

The length of the recorded consumption sample appears to influence the results of the two models; the daily MCDSA pattern preserved quite accurately all features of the recorded consumption for return periods that were shorter than the length of the historical time series, and the statistical analysis was less confident for the return period that was comparable with the length of the sample. The PRP model underestimated the water demand for return periods that were shorter than the historical series; the demand pattern was close to that recorded for the return periods that were comparable to the length of the consumption data.

## CONCLUSIONS

Interest in domestic water demand modelling stems from the goal of achieving two primary objectives: to analyse the domestic consumption process to aid systems management and to define demand patterns at a given return period to aid systems design. From this viewpoint, the present study proposed a statistical methodology for the definition of water consumption patterns based on the return period and a multivariate probabilistic approach. The method is based on a multivariate statistical analysis in which a 3-d vine copula was constructed for the primary features of the consumption process at the domestic level. The water demand was predicted for the given return periods using the patterns that were statistically generated, considering the historical series of consumption to which the methodology was applied. The analysis of the percentiles of the water demand for the given return period show that the proposed approach produced consistent demand patterns and will be a powerful tool to couple with water distribution network models for design or analysis problems. The MCDSA forecasts water demand patterns by reproducing quite accurately the main process properties: the time to peak, that is always preserved, the total daily volume, and the maximum daily consumption, increasing with the return period, for which appropriate comments are made. The total daily volume and the maximum daily consumption are in good agreement with those of recorded data only for some of the residences chosen for the case study; higher relative error occurred for others. The reason may be looked for in the shape of the monitored daily patterns: more than one main peak in water demand pattern and time series with the same shape without any variation among days results in less accurate outcomes.

The proposed methodology was validated by comparing the results with those obtained using the PRP model, as well. The PRP model calibrated on the consumption data of a single user was used to simulate the water demand during the same return periods.

The MCDSA results were always higher than those of the PRP model for both return periods; the MCDSA water demand pattern preserved and overestimated all recorded consumption features for return periods that were shorter and longer than the length of the historical series, respectively. For the two chosen return periods, the triplets values were characterised by high exceedance probability and, as a result, the total daily volume and the maximum daily consumption were higher and the generated pattern tended to overestimate water consumption.

However, MCDSA methodology even if less accurate than the PRP model is characterised by higher applicability because it needs inferior numbers of data compared to the PRP model, with the advantage that it can be easily performed at large scale by means of a common automatic meter reading system with restrained costs of data transmission and management.

As a future development of the study, also the pressure at the dwelling should be considered in the multivariate analysis in order to take into account the effect of the pressure on the water consumption phenomenon.

- First received 31 December 2014.
- Accepted in revised form 16 September 2015.

- © IWA Publishing 2016

Sign-up for alerts