## Abstract

The reported study was undertaken in a small agricultural watershed, namely, Kapgari in Eastern India having a drainage area of 973 ha. The watershed was subdivided into three sub-watersheds on the basis of drainage network and land topography. An attempt was made to relate the continuously monitored runoff data from the sub-watersheds and the whole-watershed with the rainfall and temperature data using the artificial neural network (ANN) technique. The reported study also evaluated the bias in the prediction of daily runoff with shorter length of training data set using different resampling techniques with the ANN modeling. A 10-fold cross-validation (CV) technique was used to find the optimum number of hidden neurons in the hidden layer and to avoid neural network over-fitting during the training process for shorter length of data. The results illustrated that the ANN models developed with shorter length of training data set avoid neural network over-fitting during the training process, using a 10-fold CV method. Moreover, the biasness was investigated using the bootstrap resampling technique based ANN (BANN) for short length of training data set. In comparison with the 10-fold CV technique, the BANN is more efficient in solving the problems of the over-fitting and under-fitting during training of models for shorter length of data set.

- 10-fold cross-validation
- ANN
- BANN
- daily runoff
- small agricultural watershed

## INTRODUCTION

Accurate simulation of the important hydrological processes such as runoff is essential for planning, design of structures, and management of water resources. Many physical factors such as infiltration, initial soil moisture, evaporation, land use/land cover, watershed geomorphology, and duration of the rainfall make hydrological processes extremely complex, non-linear, dynamic, and fragmented. The rainfall–runoff process is not only complex but also extremely difficult to simulate due to spatial–temporal variability and inter-relationships of the underlying climatic and physiographic variables (Zhang & Govindaraju 2003). Investigations in the past revealed that hydrological modeling is the best technique for satisfactory estimation of the runoff. Based on the degree of complexity, the hydrological models are categorized into empirical black-box models, lumped conceptual models, and distributed physically based models (Dooge 1977). The lumped conceptual models are sometimes adopted due to limited data requirement, but these models have the limitation of lengthy calibration and parameterization requirement using rigorous optimization techniques (Duan *et al*. 1992). The physically based models consider the controlling physical processes; therefore these models are considered to be a better choice in a rigorous theoretical sense, even though their data requirements are higher. Due to limited availability of data, such models do not perform satisfactorily under field conditions. Generally, in intensively monitored watersheds, all the required data are not available. Thus, there is a need to look for alternative methods for estimation of the runoff using readily available data and information.

Recently, a data-driven modeling approach has received an important boost, due to its ability to overcome some of the limitations associated with the conceptual and physically based models. The data-driven modeling has been developed with contributions from artificial intelligence, data mining, computational intelligence, machine learning, intelligent data analysis, soft computing, pattern recognition, and knowledge discovery in databases. The last two decades have seen increasing popularity of the artificial neural network (ANN) based model for simulation of the hydrological processes. The ANN is one of the intelligence techniques which is flexible and robust and requires less variety of data. Ability of the ANN has been demonstrated while applying to complex systems that may be poorly described or understood using mathematical equations (Tokar & Johnson 1999). Despite the black-box nature of the ANN, it has the flexibility in inclusion of parameters and in capturing the non-linearity of rainfall–runoff processes, making it more attractive for modeling the hydrological processes (Hsu *et al*. 1995).

A preliminary study on modeling the rainfall–runoff process using ANN was initiated by Halff *et al.* (1993), who used a three-layered feed forward ANN for the prediction of hydrographs. Since then, many studies in the area of rainfall–runoff modeling using ANN have been carried out. Extensive review on the application of ANN models in hydrologic simulation and forecasting have been presented in the literature (Maier & Dandy 2000; ASCE 2000a, 2000b; Dawson & Wilby 2001). Hsu *et al.* (1995) found that the ANN model provides a better representation of the rainfall–runoff relationship than the ARMAX time series model or the conceptual SAC-SMA (Sacramento soil moisture accounting) model. Shamseldin (1997) investigated the capability of the ANN model for rainfall–runoff modeling by comparing the simulated discharge with that of the simple linear model, the seasonally based linear perturbation model, and the nearest neighbor linear perturbation model and concluded that the ANN model can simulate discharge more accurately than some of the traditional models. The capability of the ANN model in rainfall–runoff simulation has been demonstrated by many researchers (French *et al*. 1992; Hall & Minns 1993; Navone & Ceccatto 1994; Smith & Eli 1995; Mason *et al.* 1996; Dawson & Wilby 1998; Campolo *et al.* 1999; Sajikumar & Thandaveswara 1999; Gautam *et al.* 2000; Chang & Chen 2001; Wu *et al.* 2005). The applicability of ANN was also investigated for modeling rainfall–runoff due to typhoon (Chen *et al.* 2013), who reported that a feed forward back-propagation based ANN model performs better than the conventional regression analysis method.

There are some other statistical techniques presented in the literature which have similar or better performance than ANN for rainfall–runoff modeling. There are approaches such as MARS (multiple adaptive regression splines) algorithm (Friedman 1991) and machine learning method M5 model tree (MTs) as given by Quinlan (1992), which use piece-wise linear approximations that are much easier to interpret and provide accuracy comparable to that of ANN. Solomatine & Dulal (2003) used MARS and MTs approaches in rainfall–runoff modeling and found that both techniques have almost similar performance. They concluded that even though the MT was slightly more accurate for 1-h ahead prediction of runoff, performance of ANN is slightly better than the MT for higher lead times. Toth & Brath (2007) used the ANN model and the ADM models for flood forecasting and concluded that the ADM may allow a significant improvement in prediction when focusing on the prediction of flood events and especially in the case of a limited training data availability condition. The fuzzy partitioning mechanisms are used in many hydrologic applications for creating a rule base to generate the output. An extensive review of the application of fuzzy inference system (FIS) for river flow forecasting has been presented by Jacquin & Shamseldin (2009). The FIS is based on the back-propagation algorithm which optimizes the fuzzy membership parameters to develop the best input–output relationship. The adaptive neuro-fuzzy inference system (ANFIS) based on the integration of FIS with the back-propagation algorithm was also applied by a good number of researchers for rainfall–runoff modelling and river flow forecasting (Nayak *et al.* 2005; Aqil *et al.* 2007; Mukerji *et al*. 2009; Pramanik & Panda 2009). Some of the investigators found that the ANFIS, ANN, and FIS models perform similarly in some cases, but the ANFIS predicts better than the ANN and FIS models in most of the cases. In spite of that, presently more and more researchers are utilizing the ANN because this model possesses desirable attributes of universal approximation, and the ability to learn from the examples.

Some previous studies concluded that the lack of physical concepts and relations has been a major limitation of ANN and the reason for the sceptical attitude towards this approach (ASCE 2000a, 2000b). Some hydrologists criticize the ANN model saying it does not reveal anything about the structure of the function that it represents. It is believed that the physics is locked up in the ANN model within the connection weights and threshold values, but these are not easily interpretable. The other limitation of the ANN model is its sensitivity towards the studied data, which means that the structure of ANN becomes totally different with the change of the training data set. A major drawback of the ANN model has been indicated as its incapability of predicting extreme values in river flow (Minns & Hall 1996; Campolo *et al.* 1999). There may be a number of reasons due to which the ANN models are incapable of predicting extreme values (Imrie *et al.* 2000). If the ANN models are trained using a data set that does not contain the maximum possible output value, trained networks may perform poorly for the encountering events containing previously unseen values. The requirement of a lengthy training data set is also a major issue for ANN modeling, because ANN learns from the examples without incorporating physical concepts. Toth & Brath (2007) investigated the impact of the amount of training data on the performance of the ANN model. They concluded that ANN provides an excellent performance for rainfall–runoff simulation of continuous periods, provided that an extensive set of hydro-meteorological data were available for calibration purposes. However, its performance was not satisfactory while focusing on the prediction of flood events, especially in the case of limited availability of training data. Some researchers have tried to address these limitations by applying various approaches while developing the ANN models (See & Openshaw 1999; Cigizoglu 2003; Hettiarachchi *et al.* 2005).

Data pre-processing is an important statistical approach in ANN modeling, which leads to a reduction in the prediction error. The data pre-processing method is very helpful for good generalization ability in the case of inadequate data set. Different resampling techniques may be used for the data pre-processing to explore the hidden properties in the data sets that help in efficient input–output mapping during model training. Generally, hydrological modeling using ANN has adopted simple train-and-test (hold-out) validation procedure to find the best ANN structure. Sometimes, due to high dimensionality, noise, and inadequate data set, ANN structure decided by the hold-out method can be affected by poor generalization ability and provides biased testing. A cross-validation (CV) procedure has been used for estimating the generalization performance for smaller length of data. Stone (1974) and Geisser (1975) used CV for selecting proper model parameters, as opposed to employing CV purely for selecting proper model performance. Nowadays, CV is broadly adopted in data mining and considered as a standard procedure for performance estimation and model selection. The basic form of CV is *K*-fold CV. In data mining and machine learning where the data set is relatively smaller, a 10-fold CV (*K* = 10) procedure is recommended to check the generalization ability of the model (Weiss & Kulikowski 1991). Molinaro *et al.* (2005) presented a comparison of the resampling methods to estimate the error in prediction. One of the goals of this study was to build the models using relatively smaller data sets to predict the outcome of future observations. Different resampling methods such as split-sample, *K*-fold (2-fold, 5-fold, and 10-fold) CV, leave-one-out CV (LOOCV), Monte Carlo cross-validation, and bootstrap were used to estimate error. Results show that simple split-sample estimates are seriously biased; LOOCV, 10-fold CV, and the bootstrap have the smallest bias. Additionally, the LOOCV, 10-fold CV, and bootstrap have the lowest mean square error. Also it was found that the bootstrap is quite biased in the case of small sample sizes with strong signal-to-noise ratios. More recently, Sharda *et al.* (2006) simulated runoff from middle Himalayan watersheds using the artificial intelligence technique with only 2 years' daily rainfall, runoff, base flow, and total flow data. The authors used a 10-fold CV procedure to check the generalization ability of the models. Akhtar *et al.* (2009) applied a 10-fold CV procedure to build the best ANN model for river flow forecasting. The use of bootstrap (Efron 1979) approach to build a neural network solution for application in hydrological modeling is on the rise. It is a computational procedure that uses intensive resampling with replacement, in order to reduce uncertainty (Efron & Tibshirani 1993). The neural bootstrap has been used to perform bootstrap aggregation, also known as multi-model ensembles, to produce averaged outputs and a more stable solution (Hsieh & Tang 1998). Abrahart (2003) employed the bootstrap technique for rainfall–runoff modeling and reported that it offered marginal improvement in terms of greater accuracies and better global generalizations. Jeong & Kim (2005) used the bootstrap technique to simulate monthly rainfall–runoff. The authors concluded that the bootstrap based ANN model (BANN) is less sensitive to the input variable selection and the number of hidden nodes than the simple neural network. Shu & Ouarda (2007) applied ensemble ANN models for regional flood frequency analysis in the canonical physiographic space and achieved good results. Sharma & Tiwari (2009) developed BANN for better simulation of monthly rainfall–runoff relationships. Ensemble flood forecasting was made by averaging the output of member bootstrapped neural networks (Tiwari & Chatterjee 2010). The study showed that the ensemble prediction is more consistent, reproducible, and more stable as compared to the traditional ANN approach.

### Background and scope of the present study

The rainfall information alone is insufficient to compute the runoff from a river basin or catchment as the catchment characteristics (related to soil moisture) play an important role in determining the runoff rate variation (Minns & Hall 1996; Campolo *et al.* 1999). In order to represent the soil moisture status, a common approach adopted by most of the above-mentioned researchers, is that runoff or water level at previous time steps were used as inputs to develop the models. An ANN model has the limitation of being highly sensitive towards the inputs, which means that the prediction accuracy of a developed ANN model with specific number of inputs may reduce if a lesser number of inputs are used for prediction. The use of runoff at previous time steps as input may increase the accuracy of the simulation but these developed ANN models may be insufficient to provide good prediction accuracy if runoff at previous time steps is not available. However, it is very difficult to get continuously monitored runoff from small agricultural watersheds in developing countries like India. Therefore, prediction of the daily runoff from small agricultural watersheds requires development of ANN models, without using previous time step runoff as an input.

Towards this investigation, Zhang & Govindaraju (2000) used a feed forward ANN for the prediction of monthly runoff with average monthly rainfall of current and previous months and the average monthly temperature as inputs and utilized Bayesian concept in deriving the training algorithm. The performance of modular networks in predicting runoff over three medium-sized watersheds in Kansas, USA was examined. More recently, a geomorphology based ANN (GANN) model was used for prediction of the watershed runoff as demonstrated by Zhang & Govindaraju (2003). The study shows that GANNs have the potential scope for estimating the direct runoff. Sarangi *et al.* (2005) used geomorphological parameters as inputs with rainfall for prediction of surface runoff and concluded that selected geomorphological parameters with rainfall depth enhanced the accuracy of runoff rate predictions. Sharma & Tiwari (2009) incorporated geomorphologic parameters such as soil, topography, and vegetation information as inputs to the ANN for estimation of monthly runoff at the catchment scale. Raghuwanshi *et al.* (2006) developed ANN models for prediction of daily and weekly watershed runoff and sediment yield using the rainfall and temperature as inputs with 7 years' monsoon season data. As discussed earlier, long length of training data sets is also a limitation of ANN based rainfall–runoff modeling for good accuracy of prediction (Toth & Brath 2007). Some researchers found that a single ANN model is rigid in nature and not suitable for capturing a fragmented input–output mapping during short length of training data sets. Tiwari & Chatterjee (2010) found that short length of training data sets with appropriate representation can perform similarly to the ANN models with long length training data sets for flood forecasting. From the reported review it was found that different resampling techniques may be used for the data pre-processing to explore the hidden properties in the smaller length data set that help in efficient input–output mapping during model development under data scarcity conditions.

In view of the above-mentioned facts, the present study aimed at developing unbiased ANN models to predict daily runoff with limited quantum of continuously monitored runoff data from a small agricultural watershed, namely, Kapgari in Eastern India. The ANN models were developed using easily available climatic variables such as rainfall and temperature for predicting daily runoff from a small agricultural watershed. The 10-fold CV and bootstrapped resampling techniques were used to reduce the prediction error under data scarcity condition. Effective inputs were selected based on correlation coefficients (*r*) and the ANN models were trained with resampling techniques using the time series of the selected inputs. Finally, the results obtained from the 10-fold CV and bootstrapped resampling technique based ANN models were compared.

## STUDY AREA AND DATA USED

A small agricultural watershed, namely, Kapgari watershed (KGW), situated in the Midnapore district of West Bengal state in Eastern India was selected for this study. This is mainly an agricultural watershed with unmanaged natural resources. The geographical boundary of the watershed lies between 86°50′ and 86°55′ E longitude and 22°30′ and 22°35′N latitude. An unlined main drain and three sub-drains exist in the whole watershed area of 973 ha. Further, on the basis of drainage network and land topography, the watershed was subdivided into three sub-watersheds (Figure 1). The areas of the sub-watersheds KGSW1, KGSW2, and KGSW3 are 280, 330, and 363 ha, respectively. The topography of the watershed is undulating. The slope ranges from 0.07 to 8.29% with an average slope of 3.03%. The major crop of the study area is paddy, which is usually cultivated during the rainy season. The major soil textural classes present are sandy loam, silt loam, clay loam, and loam. However, the predominant soil of the watershed is sandy loam soil. The climate is sub-humid subtropical with an average annual rainfall of 1,320 mm, of which about 80% occurs during the rainy months of June to October. Most of the rainwater drains from the watershed in the form of surface flow. The daily mean temperature ranges from a minimum of 24 °C to a maximum of 40 °C. The daily mean relative humidity varies from a minimum of 59.4% to a maximum of 94.3%. Average wind speed varies from 1.1 m/s to 2.1 m/s. The mean solar radiation varies from 5.0 kWh/m^{2} to 7.4 kWh/m^{2} and average evaporation varies from 2.1 mm/day to 5.7 mm/day.

For effective monitoring of surface runoff, culverts were constructed at the main outlet of the watershed and the outlets of the sub-watersheds, where gauging stations have been installed. Each gauging station consists of a stilling well of 1 m diameter, in which an automatic stage level recorder has been installed for the continuous monitoring of discharge from the sub-watersheds and the whole watershed for the monsoon season of the years 2003 to 2005. A current meter was used to measure the velocity of water passing through the culverts. The fluctuation of water level in the culverts was recorded in the chart papers of automatic stage level recorders. Other metrological data such as daily rainfall, daily maximum and minimum air temperature, were also collected for the same period from the meteorological observatory established in the study area.

## THEORETICAL CONSIDERATIONS

### Artificial neural network

An ANN is a massively parallel distributed processor with a natural propensity for storing knowledge and making it available for use (Haykin 1994). ANN generally consists of three layers: an input layer, an output layer, and one or more hidden layers. The information passes from input layer to output layer through neurons which are present in each layer. The network function is determined by connection between the nodes. The connection between the processing neurons is weighted by scalar weight and a bias, which is adapted during the model training. The number of input neurons, output neurons, and the neurons in the hidden layer depends upon the problem being studied.

A neural network can be trained to perform a particular function by adjusting the value of connection weights between the nodes (Rumelhart & McClelland 1986; Muller & Reinhardt 1991). At the beginning of the training, the initial values of the weights can be assigned randomly or based on experience. In this process, the learning algorithm systematically changes the weights to correctly perform a desired input–output relationship. The process of training is said to be finished when the mean squared error between the ANN output *y _{j}*(

*t*) and the corresponding actual output

*d*(

_{j}*t*) becomes closer to the preset error goal. The mean squared error

*E*(

*t*) at any time

*t*, is calculated over the entire data set using Equation (1).1Determination of appropriate ANN architecture is one of the most important tasks in the model-building process. From various past studies it was revealed that a multilayer feed forward network outperforms all the other networks. Although multilayer feed forward networks are one of the most fundamental models, they are also the most popular type of ANN structure suited for practical applications. The multilayer feed forward network has a layered arrangement of the nonlinear processing nodes (Figure 2). The architecture consists of hidden layers of neural network and hidden neurons in the hidden layers. Determination of the optimum number of hidden layers and nodes in the neural network architecture, to produce good results, is a trial-and-error procedure which depends on the type of problem and the availability of data. Hecht-Nielsen (1990) provided a proof that even one hidden layer of neurons can be sufficient to model any solution surface of practical interest. Wang

*et al.*(2011) reported that one of the important issues in the development of an ANN model is the determination of the optimal number of neurons in the hidden layer that can satisfactorily capture the nonlinear relationship existing between the input and the output variables. If the number of neurons in the hidden layer is small, the network may not have sufficient degrees of freedom to learn the process correctly. The training takes a long time and the network may sometimes over-fit the data due to the number of neurons in the hidden layer being too large (Karunanithi

*et al.*1994). The learning algorithm to train the network for responding correctly to a given set of inputs also plays an important role in the process of model development. A properly trained network with Levenberg–Marquardt (LM) back propagation algorithm gives reasonable results when presented with new inputs during the validation (Hagan & Menhaj 1994).

In the present study, a multilayered feed forward network having one hidden layer and a sigmoid activation function in the neurons was trained with LM back propagation algorithm. A 10-fold CV method was adopted for selection of the optimal number of hidden neurons to avoid the neural network over-fitting during the training process. Finally, the trained architecture (having the optimized values of the connection weights) was tested using the unseen data sets to evaluate the model performance.

### Neural network training algorithm

A wide range of algorithms has been developed for training the ANN network to achieve the optimum model performance, while ensuring generalization and computational efficiency. A back-propagation algorithm is the most common neural network training algorithm (Fausett 1994; Patterson 1996). A standard second-order nonlinear least-square technique based on the back-propagation algorithm, such as the LM algorithm to increase the speed (Masters 1995) and efficiency of the training (Hagan & Menhaj 1994), was used for training the ANN models. Demuth & Beale (1998) reported that the LM algorithm uses a second-order training mode without having to compute the Hessian matrix. When the performance function has the form of the sum of squares (as is typical in training feed-forward networks), the Hessian matrix can be approximated as *H* = *J ^{T}J* and the gradient can be computed as

*g*=

*J*

*, where*

^{T}e*J*is Jacobian matrix that contains first derivatives of the network errors with respect to the weights and biases, and

*e*is a vector of network errors. The LM algorithm uses the above approximation to the Hessian matrix in the Newton-like weights update as Equation (2):2where

*w*indicates the weight of the neural network, and

*μ*is a non-negative scalar that controls the learning process. When the parameter

*μ*is large, the above expression approximates gradient descent with a small step size; while for a small

*μ*the algorithm approximates the Newton method. The LM can overcome between its two extremes, the gradient descent and Newton's algorithm. The LM method is a standard method for minimization of the mean square error criterion, due to its rapid convergence properties and robustness (Demuth & Beale 1998).

### Ten-fold CV technique

In the data mining and machine learning, where the length of data for the model development is inadequate, a 10-fold CV is the most common procedure recommended to check the generalization ability of the model (Weiss & Kulikowski 1991). This is performed to avoid neural network over-fitting during the training process. To perform the 10-fold CV procedure, data for the model development are first partitioned into 10 equally (or nearly equal) sized segments or folds. After making 10-fold, 10 iterations of the training and validation are performed such that in each iteration, a different fold of the data is held-out for the validation while the remaining nine folds are used for training. Subsequently, the trained models are used for predictions of the data in the validation fold. Thus, each time a model is constructed and tested with an ‘unseen’ data set. The predetermined performance functions can be used to track the performance of each learning algorithm on each fold. The samples obtained from the results of performance of each learning algorithm on each fold can be used in a statistical hypothesis test or an aggregate measure of these samples obtained by the different methodologies such as averaging. The magic of this procedure is that, during each fold, 90% of the data is available for training, yet the final performance is based on 100% of data. The advantage of this method is that it performs reliable unbiased testing on the smaller data set, because this process requires much more computational effort than simple trained-and-tested (hold-out) approach.

### Bootstrapping

Bootstrapping is a data-driven simulation method that uses intensive resampling with replacement. Efron & Tibshirani (1993) developed the bootstrapping method which manipulates the training data sets in order to generate different models and uses them to obtain an aggregated predictor. The bootstrap procedure involves resampling with replacement, to reduce the uncertainties. The aim of resampling is to mimic the random component of a process and to reduce variance through averaging over numerous different partitions of the data. The neural network simulations generally converge at the local minima. This results in slightly different predictions every time neural networks are trained, due to the random initializations of the weight matrix. The neural bootstraps have been used to perform the bootstrap aggregation (bagging) of multi-model ensembles which produce averaged outputs and a more stable solution. This is done by repeated sampling with replacement of the original data set of size N, to obtain B bootstrap data sets, each with a size of N. Each bootstrap data set contains different data, resulting in B neural network models, all of which may differ slightly. Efron & Tibshirani (1993) suggested B to be between 50 and 200.

By calculating averages and standard deviations of B testing results, one obtains robust values of the predicted variable and associated uncertainty estimates for independent data. A model fitted to each of the generated bootstrap data sets and bootstrapping estimate is calculated as the mean of each model (Equation (3)):3

## METHODOLOGY

### Selection of input variables

Determination of signiﬁcant input variables is one of the most important steps in the ANN hydrologic model development process (Bowden *et al.* 2005a, b). The reported study used a correlation coefficient (significant at 0.01 levels, two-tailed) of the hydro-climatic data to select the effective inputs. The correlation coefficients between variables are presented in Table 1, which shows that there is a very good correlation of the present day runoff with present day rainfall and significant correlation with the 1-day lag rainfall. The correlation coefficient between the present day runoff and the present day maximum and minimum temperatures was also significant. It is also observed that the correlation among daily runoff, 2-day lag rainfall, and 1-day lag maximum and minimum temperature was quite low. Therefore, these combinations were not considered for this study. The description of different ANN models developed for simulating daily runoff is presented in Table 2. A schematic diagram of architecture of developed neural networks with a single hidden layer is shown in Figure 2.

### Data preparation

ANN has the ability to handle nonlinear, noisy, and nonstationary data. However, with suitable data preparation beforehand, it is possible to improve the modeling performance (Maier & Dandy 2000; Bray & Han 2004). The data preparation involves a number of processes such as data collection, data division, and data pre-processing. In the present study, daily runoff as well as daily rainfall, maximum and minimum temperatures during monsoon season were collected for a period of 3 years (2003–2005) from the observatory installed in the study area. The results of statistical analysis of available hydro-metrological data for the present study are presented in Table 3. The statistical analysis showed that the year 2005 had the highest peak value followed by the years 2004 and 2003. The trend of variation of all the variables is similar during all 3 years. However, due to some high intensity rainfall events, the runoff peaks were higher in 2004 and 2005. Normally, a major portion of the data set is used for the training and a relatively smaller portion is used for testing the ANN models. Thus, 2 years' data were selected for the development of models and the remaining 1 year's data were used for testing the developed models. However, a major limitation of the ANN models is that they are not usually able to extrapolate. The training pattern therefore should go at least to the edge of the problem domain in all dimensions. To use a higher range of available data in the model training, data sets of the years 2004 and 2005 which cover all the possible scenarios were used for the model development and the developed models were tested using the unseen data sets of the year 2003 to evaluate the model performance.

It is also essential for the ANN model to have training pattern evenly distributed within the region. If the pattern is not evenly distributed, the trained network performs poorly for the region where density of the training data set is less. Therefore, to perform a 10-fold CV procedure, data for the model development are stratified prior to being split into 10 folds. Stratification is a data rearrangement process to ensure that each fold is a good representative of the whole. The sigmoid activation function used for training the network has lower and upper limits of 0 and 1, respectively. Therefore, in order to suit the consistency of the model, the data used in input and output layers were normalized in the range of 0 to 1 and then returned to original values after the simulation. Equation (4) was adopted to normalize the data set:4where *X _{i}* is the original values of different selected inputs,

*X*

_{i,}_{norm}is the normalized value,

*X*

_{max}and

*X*

_{min}are the maximum and minimum values respectively.

The normalization of the data, ANN model development, and analysis of the results were performed by using MATLAB software. MATLAB is an interactive software tool for engineering analysis and mathematical computations entirely written in the C language. The command window of MATLAB (http://www.mathworks.in/help/nnet/index.html) was used for the ANN model development.

### Ten-fold CV based ANN models development

The goal of the reported study was to develop the ANN models for predicting the daily runoff using climatic variables with smaller length of data set. Therefore, in this study, different combinations of the rainfall and temperature were considered as input for the ANN models' development. To develop ANN models using shorter length of the training data set, such as weather data for 2 years (2004–2005), a 10-fold CV procedure was used to check the generalization ability of the model. The main purpose of using a 10-fold CV technique was to find the optimum number of hidden neurons in the hidden layer to avoid neural network over-fitting during the training process. A procedure was followed as mentioned earlier. The performance of each learning algorithm on each fold was tracked using root mean square error (RMSE) and mean absolute error (MAE). After completing 10 iterations of the training and validation, average RMSE and MAE were used to obtain an aggregate measure from these samples. The ANN structure, which showed the least CV error, was selected for the daily runoff prediction. After selection of the optimum number of inputs and neurons in the hidden layer, the model was trained with the whole data set of the years 2004 and 2005 and evaluated with the observed data of the year 2003.

### Bootstrap based ANN models development

The bootstrap resampling method was used to generate different realizations of the data sets to create a set of bootstrap samples using intensive resampling with replacement. These data sets provided a better understanding of the average and variability of the original unknown distribution or process. The bootstrap BANN is less sensitive to the input variable selection and the number of hidden nodes than the simple neural network (Jeong & Kim 2005). Therefore, the optimized number of inputs and neurons in the hidden layer were used to develop the BANN. Finally, results obtained from the BANN models were compared with the simple ANN models.

### Performance evaluation of models

The performance of the trained network was evaluated using the criteria proposed by previous researchers (Nash & Sutcliffe 1970; WMO 1975; ASCE Task Committee on Definition of Criteria for Evaluation of Watershed Models 1993). In the reported study, the standard goodness-of-fit such as coefficient of simulation efficiency (*COE*), coefficient of determination (*R*^{2}), *RMSE*, *MAE*, and percent deviation (*D*_{v}) were used for model evaluation. The *COE* provides a measure of the ability of a model to predict the values that are different from the mean. The value of *COE* close to 1 indicates good agreement between the observed and predicted values. The *R*^{2} value provides a measure of how well future outcomes are likely to be predicted by the model. High *R*^{2} value indicates a close relationship between the observed and model predicted values. The *MAE* is a linear score which indicates that all the individual differences are weighted equally in the average but the *RMSE* is a quadratic scoring rule that provides a relatively high weight to the larger errors. This means that the *RMSE* is most significant when large errors are particularly undesirable. Both the *MAE* and *RMSE* can range from 0 to ∞. *MAE* and *RMSE* equal to zero indicate a perfect match between the observed and predicted values. These coefficients are independent of the scale of the data used and are useful in assessing the goodness of fit of the model (Dawson & Wilby 1998). The magnitude of the percent deviation (*D*_{v}) indicates the magnitude of difference between the measured value and the simulated value. Bingner *et al.* (1989) reported that over- and underprediction limits for model simulation within 20% from the measured values are considered as the acceptable levels of accuracy for the simulations. The mathematical expressions of the goodness-of-fit indices (GFI) used in the study are presented in Table 4.

## RESULTS AND DISCUSSION

### Development of the ANN models for Kapgari watershed

The ANN model structure was determined using a 10-fold CV procedure for unbiased testing. The hidden neurons in a single hidden layer were varied from 1 to 5. For each hidden neuron, 10 iterations of the training and validation were performed and performance of each learning algorithm on each fold was tracked using the *RMSE* and *MAE* performance evaluation functions. Averaging was used to obtain an aggregate measure of all 10 iterations. The average performance of all hidden neurons is presented in Table 5. It is observed from Table 5 that as we went on adding hidden neurons the *RMSE* and *MAE* decreased for both training as well as CV but after a certain number of hidden neurons the *RMSE* and *MAE* decreased for the training and increased for the CV. Hence, in this study, the ANN structure was decided after considering the least *RMSE* and *MAE* from the CV results. Using only a present day rainfall as input, the Model KGM1 (1-3-1) gave a minimum *RMSE* and *MAE* values of 0.726 mm and 0.382 mm, respectively, during the CV. Adding 1-day lag of the daily rainfall as input improved the prediction performance and the Model KGM2 (2-3-1) gave minimum *RMSE* and *MAE* values of 0.638 and 0.339 mm, respectively, for prediction of the daily runoff during CV. Therefore, Model KGM2 (2-3-1) was considered the best when only daily rainfall data are available. Similarly, addition of the daily maximum and minimum temperature data along with the rainfall data also improved the performance of prediction and the Model KGM3 (4-3-1) gave the least *RMSE* and *MAE* of 0.573 and 0.334 mm, respectively, for prediction of the daily runoff during CV. Hence, for this study, the Model KGM3 (4-3-1) was considered the best for prediction of daily runoff and was used for the further analysis.

The selected ANN structure KGM3 (4-3-1) was trained using the whole data set of the years 2004 and 2005. The weights for this trained structure were saved and the network was evaluated with testing data set of the year 2003. The scatter plots of the observed and KGM3 (4-3-1) model predicted daily runoff during the training and testing period are shown along with 1:1 line in Figures 3(a) and 3(b) respectively. It is observed from the figures that during the training period, the major portion of the scatter plot is well distributed about the 1:1 line. The results for the testing period showed an overestimation of the very low runoff values which may be due to the fact that the minimum value of the output training vector was higher than the corresponding value in the testing data set. It was found from the training and testing period results that the predictions of peak or higher runoff values are better than low and medium runoff values. This may be due to the fact that the model is trained well with the peak or higher values and is capable of predicting peak values with considerable accuracy. A statistical analysis was performed to compare the predicted daily runoff during both training and testing periods and the results are presented in Table 6. The value of coefficient of determination (*R*^{2}) of 0.95 and 0.90 indicates a close match between the measured and predicted daily runoff values during the training and testing periods respectively. The *COE* of 0.94 for the training and 0.88 for the testing shows a good agreement between the measured and predicted daily runoff.

### Development of the ANN models for Kapgari sub-watersheds

All the networks for the sub-watersheds were developed using the same procedure that was used for model development for the whole Kapgari watershed. Generalized ANN structures were selected using the 10-fold CV results. On the basis of minimum error in CV, the best ANN structures were selected for each input combination (M1, M2, and M3). The performance of the selected best model for each input combination, in terms of *RMSE* and *MAE* is presented in Table 7. It is observed from Table 7 that the ANN structure with input combination M3 has better generalization ability than the combination of M1 and M2 for all the sub-watersheds. Therefore, the ANN structures KGSW1M3 (4-3-1), KGSW2M3 (4-4-1), and KGSW3M3 (4-3-1) were selected for prediction of the daily runoff from Kapgari sub-watersheds 1, 2, and 3, respectively.

The selected ANN models were trained using the whole data sets of the years 2004 and 2005. The weights for all these trained models were saved and these networks were evaluated using the testing data set of the year 2003. The scatter plot of the observed and predicted daily runoff during the training and testing periods for sub-watersheds 1, 2, and 3 are shown along with 1:1 line in Figure 4. The daily runoff values were slightly overpredicted for the medium flow and underpredicted for the peak and low flow conditions. The training period scatter plot shows the data points to be very well distributed around the 1:1 line. However, during the testing periods, mostly under-prediction was observed, even though the models were capable of predicting the low and high runoff values with considerable accuracy. Statistical analysis was performed to compare the predicted daily runoff for both training and testing periods and is presented in Table 8. The values of the coefficients of determination (*R*^{2}) were found to be 0.944, 0.951, and 0.956 during the training and 0.902, 0.755, and 0.873 during the testing for sub-watersheds 1, 2, and 3, respectively. These high *R*^{2} values indicate a close relationship between the observed and predicted daily runoff. The *COE* of 0.941, 0.947, and 0.954 during training and 0.875, 0.732, and 0.859 during the testing period for sub-watersheds 1, 2, and 3, respectively, show a good agreement between the daily observed runoff and predicted runoff.

These results clearly indicate that the ANN models KGM3 (4-3-1), KGSW1M3 (4-3-1), KGSW2M3 (4-4-1), and KGSW3M3 (4-3-1) can be used to predict the daily runoff using daily rainfall and temperature data as input. This also illustrates the application of ANN in modeling highly complex and nonlinear phenomenon of the rainfall–runoff process. Thus, the ANN models with easily available information can be used to predict the daily runoff.

### Development of the BANN models for Kapagari watershed and its sub-watersheds

The best input combination M3 was chosen to develop the BANN for KGW and its sub-watersheds. The BANN models were used to perform the bootstrap aggregation of multi-model ensembles to get a more stable solution. Chernick & LaBudde (2010) suggested that for a small sample size, the lower order bootstrap works better than the higher order and suggested that 20 to 100 bootstrap samples be used for small sample sizes. Therefore, Bootstrap.xla an Excel Add-In (Barreto & Howland 2006) was used to develop 50 bootstraps' resample for this study. Each bootstrapped resample was used to develop an ANN model, and likewise, 50 ANN models were developed and then combined (using mean) to approximate the relationship between model inputs and outputs.

Ensemble prediction of the daily runoff was done by 50 sets of weights instead of one set of weights for the testing year 2003. Comparison between the observed and the predicted daily runoff by the ANN and BANN models during the testing period is presented graphically in Figure 5 for the whole KGW and its sub-watersheds KGSW1, KGSW2, and KGSW3 respectively. The scatter plots show that BANN models estimate better peaks or higher runoff values than the 10-fold CV based ANN models for KGW and its sub-watersheds. It is observed from these plots that the BANN models had both underprediction and overprediction for the medium runoff values which are very well distributed about the 1:1 line than ANN predicted values. Statistical analysis was also performed for the BANN models' predicted runoff for the testing year 2003. The results of the BANN models and the corresponding ANN models in terms of *COE*, *RMSE*, coefficient of determination (*R*^{2}), and percent deviation (% *D*_{v}) for KGW and its sub-watersheds during the testing period are presented in Table 9. The higher *COE* of 0.902, 0.894, 0.771, and 0.907 by the BANN models as compared to the ANN models for Kapagari watershed and its sub-watersheds 1, 2, and 3, respectively, show a very good agreement between the daily observed runoff and the predicted runoff. The percent deviation (% *D*_{v}) values of 3.51, 7.09, 2.87, and 6.15 in the case of the BANN models for the Kapagari watershed and its sub-watersheds 1, 2, and 3, respectively, indicate that the magnitude of difference between the observed values and the predicted values is much less (less than 10%) as compared to that of the ANN models. The higher *R*^{2} and lesser *RMSE* values for the BANN models as compared to the ANN models also indicates that the use of ANN with bootstrapped resampling provides more accurate prediction of daily runoff. These results illustrate that the bootstrap resampling technique based artificial neural network (BANN) is more capable of solving the problems of overfitting and underfitting than 10-fold CV technique based ANN models during the training of the models with shorter length of data set. The BANN models were better than the 10-fold CV technique based ANN models because these models were developed with 50 resamples, out of which some models were trained with the higher peaks, some were trained with the lower peaks, and some were trained with both higher and lower peaks. The average prediction from all these 50 models increased the prediction efficiency. On the other hand, 10-fold CV technique based ANN models were trained using only one data set and simulation was made according to peaks presented in that data set.

## SUMMARY AND CONCLUSIONS

The major goal of the present study was to develop unbiased ANN models using readily available inputs and shorter length of training data sets for the prediction of daily runoff from a small agricultural watershed and its sub-watersheds in Eastern India. Only 3 years' (2003–2005) rainy season hydro-metrological data were available for the analysis. Two years' data were used for development of the ANN models and 1 year's data were used to test the developed models. In the reported study, ANN models were developed using the readily available climatic variables such as rainfall and temperature. To deal with the limitation of the ANN modeling with shorter length of training data set, a 10-fold CV method was used to avoid the neural network overfitting. It was done to minimize the bias while testing with shorter length of data set using simple train-and-test (hold-out) technique. Several ANN models were developed for the watershed and its sub-watersheds. Out of the developed models, four ANN models KGM3 (4-3-1), KGSW1M3 (4-3-1), KGSW2M3 (4-4-1), and KGSW3M3 (4-3-1) considering both rainfall and temperature as input, were selected to predict the daily runoff for KGW and its sub-watersheds 1, 2, and 3, respectively. Biasness in the daily runoff prediction associated with the shorter length of training data set was also investigated using the bootstrap resampling technique BANN. Optimized numbers of inputs were used to develop 50 bootstrapped resamples and each resample was used to develop an ANN model to perform bootstrap aggregation of multi-model ensembles which produced averaged outputs. The approaches used in this study are of an objective nature and can be easily applied to other small agricultural watersheds under data scarcity conditions. The results of this study also have practical importance and wider applicability. The following specific conclusions were drawn from the study:

It was found that climatic variables such as daily rainfall and temperature, which are readily available, can be used for accurate prediction of the daily runoff from small agricultural watersheds using ANN modeling.

The ANN models constructed from climatic variables only will have the potential of filling missing data in a daily runoff time series and for predicting the influence of climatic change on runoff in data sparse regions.

It was revealed that the 10-fold CV technique can help in selecting the ANN structure, which provides unbiased estimation in the case of shorter length of training data set.

The bootstrapped resampling technique based ANN models can provide a more stable solution as compared to the 10-fold CV technique based ANN models and improve the prediction accuracy for the shorter length of training data set.

The peaks in the time series of the daily runoff data can be estimated more accurately using the bootstrap resampling technique based ANN models than that of 10-fold CV technique based ANN models for the shorter length of training data set.

- First received 29 September 2013.
- Accepted in revised form 13 May 2014.

- © IWA Publishing 2015

Sign-up for alerts