## Abstract

Applying feed-forward neural networks has been limited due to the use of conventional gradient-based slow learning algorithms in training and iterative determination of network parameters. This paper demonstrates a method that partly overcomes these problems by using an extreme learning machine (ELM) which predicts the hydrological time-series very quickly. ELMs, also called single-hidden layer feed-forward neural networks (SLFNs), are able to well generalize the performance for extremely complex problems. ELM randomly chooses a single hidden layer and analytically determines the weights to predict the output. The ELM method was applied to predict hydrological flow series for the Tryggevælde Catchment, Denmark and for the Mississippi River at Vicksburg, USA. The results confirmed that ELM's performance was similar or better in terms of root mean square error (RMSE) and normalized root mean square error (NRMSE) compared to ANN and other previously published techniques, namely evolutionary computation based support vector machine (EC-SVM), standard chaotic approach and inverse approach.

- extreme learning machine
- flows
- forecasting
- hydrology
- time-series

## NOTATION

*Q*_{t}- Flow at time
*t* - (
*Q*)_{m}_{t} - Modelled flow at time
*t* - (
*Q*)_{o}_{t} - Observed flow at time
*t* *H*- Hidden layer output matrix
*H′*- Moore–Penrose generalized inverse of hidden layer output matrix
- Weight vector connecting the
*i*^{th}hidden node and the input variables - Bias of the
*i*^{th}hidden node *L*- Number of random hidden nodes
*β*_{i}- Weight connecting the hidden node and the output node
- Activation function (example, sigmoidal function)
- Target at time
*j* - Output at time
*j*

## INTRODUCTION

The application of data-driven modelling approaches including artificial neural networks (ANN) and support vector machines (SVM) has been widespread in the water resource engineering field, especially for predicting hydrological time-series. This is because they can establish complex non-linear relationships between input and output variables (Tokar & Johnson 1999). The main advantage of these techniques is that they do not require information about the complex nature of the underlying hydrological process. When data-driven modeling is applied, input variables including precipitation, lagged precipitation, and lagged discharges are normally employed to forecast the discharges (Akhtar *et al.* 2009). ANN with multi-layer perceptron (MLP) networks trained with gradient-based methods has been used in many applications. Traditionally, the weight vectors in ANN models is determined using the back-propagation (BP) algorithm by minimizing the mean square error between the measured and forecasted discharges of the hydrological process. However, the performance of ANN depends on network architecture (e.g., number of hidden layers, the number of neurons, activation functions, etc.), performance criteria, division and pre-processing of data, and determining appropriate model inputs (Maier & Dandy 2000). According to Chen & Chang (2009), a very simple ANN network architecture may not accurately predict while too complex architecture may reduce its generalization ability due to over-fitting. In order to overcome this problem, Chen & Chang (2009) proposed an evolutionary algorithm (EA)-based ANN (EANN) to: first, define the optimal network architecture and second, generate a model to accurately predict a hydrological system. They applied EANN to predict real-time inflows to Shihmen Reservoir in Taiwan. It performed better than autoregressive moving average with the exogenous (ARMAX) input time-series approach.

Sing *et al.* (2015) applied ANN to establish relationships between rainfall and temperature data with runoff from an agricultural catchment (973 ha) in Kapgari, India. Several resampling of short length training data sets using bootstrap resampling-based ANN (BANN) found solutions without over-fitting. A 10-fold cross-validation (CV) technique-based ANN was also applied to obtain unbiased reliable testing results. Sing *et al.* (2015) demonstrated that BANN provides more stable solutions and was able to solve problems of over-fitting and under-fitting than 10-fold CV-based ANN.

Hsu *et al.* (1995) demonstrated that non-linear ANN models provided a more representative rainfall–runoff relationship. They compared ANN results with ARMAX and the conceptual Sacramento soil moisture accounting model. Fernando & Jayawardena (1998) also modelled this by using radial basis function neural network (RBF-NN). They showed it performed better than the ARMAX. Tayfur & Sing (2006) applied ANN and fuzzy logic (FL) for predicting event-based runoff and the results were compared against the kinematic wave approximation (KWA). A three-layered feed-forward ANN was developed using the sigmoid function and the BP algorithm. The FL model was developed employing the triangular fuzzy membership functions for the input and output variables. Adaptive neuro-fuzzy inference system (ANFIS) and ANN were applied by Tingsanchali & Quang (2004) to forecast the daily flood flow for the Yom River Basin in Thailand. ANFIS performed better.

Liong *et al.* (2002a) applied genetic programming (GP) to real-time runoff forecasting. GP was used as an error updating scheme to complement a traditional hydrological model (MIKE11-NAM). The prediction accuracy was enhanced by non-dimensionalizing the variables. The functional relationship between rainfall and runoff derived by GP showed good prediction accuracy. Rodriguez-Vazquez *et al.* (2012) proposed GP and genetic algorithm (GA) for rainfall–runoff modelling of a sub-basin located near Mexico City. They developed two different models for the analysis. The first was a multi-objective optimization-based GP model for determining the structures and parameters of non-linear auto-regressive models (NARMAX). The second was a GA-based model that optimized the parameters of a non-conventional rainfall–runoff model. Their analysis concluded that the multi-objective optimization-based GP model best fitted the analyzed storms of interest. Nourani *et al.* (2013) applied GP for rainfall–runoff modelling and included watershed geomorphological features as spatial data together with temporal data. Two separate models, namely separated geomorphological GP (SGGP) and integrated geomorphological GP (IGGP) models and their application were described. These models applied to the Eel River Watershed in California, USA, could compensate for a lack of temporal data. Specifically, the SGGP model could distinguish the dominant variables of the runoff process in the sub-basins and IGGP was a reliable tool for spatial and temporal interpolation of runoff through the watershed.

Support vector machine (SVM) based on the structural risk minimization approach has proved a promising learning machine for forecasting problems. It was successfully applied to regression problems including flow forecasting and rainfall–runoff modelling. SVM was used to predict the stage in Dhaka, Bangladesh using daily water level data from five gauging stations (Liong & Sivapragasam 2002). Results showed that SVM performed better than ANN. Sivapragasam (2002) applied SVM to rainfall–runoff modelling using six storm events that occurred in Upper Bukit Timah Catchment, Singapore. The results showed the robustness of SVM compared to multi-layered feed-forward ANN. Lin *et al.* (2006) reported SVM as a powerful tool that could overcome some of the drawbacks that were evident in ANN: (1) finding global solutions, (2) over-fitting unlikely, (3) generating non-linear solutions using the Kernel Function, and (4) obtaining optimized solutions using a limited training data set. However, SVM requires a long simulation time for large complex problems and it also requires the selection of an appropriate kernel function and its specific parameters (*C* and *ɛ*). Yu *et al.* (2004) presented a combined application of chaos theory and SVM where the parameters were optimized with an EA to reduce prediction error. In SVM, Gaussian kernel function, being more suitable, was applied to hydrological time-series application (Liong & Sivapragasm 2002). An EA engine, called shuffled complex evolution (SCE), was applied to determine five parameters, i.e., time delay, embedding dimension and three SVM parameters (tradeoff between empirical error and model complexity, insensitive loss function and width of Gaussian kernel function). EA-based SVM (EC-SVM) was used to predict runoff time-series for catchments including the Tryggevælde Catchment, Denmark and the Mississippi River, USA. The results showed that EC-SVM improved the prediction accuracy compared to standard chaos technique, Naïve, ARIMA and inverse approach. Wang *et al.* (2013) applied an EA, called particle swarm optimization (PSO), to determine SVM parameters. They further proposed ensemble empirical mode decomposition (EEMD) for decomposing annual rainfall series in SVM to avoid model over-fitting or under-fitting. The proposed model (PSO-SVM-EEMD) improved the rainfall–runoff forecasting significantly compared to ordinary least-square regression model and ANN. Sivapragasam *et al.* (2001) enhanced the performance of SVM by pre-processing the input data using a noise-reduction algorithm, called singular spectrum analysis (SSA). SSA was coupled with SVM and used to predict the flows from the Tryggevælde Catchment. It improved the prediction accuracy compared to the non-linear prediction (NLP) method.

Application of data-driven modelling methods has been made to quantify the uncertainty associated with the prediction. Kingston *et al.* (2005) highlighted ANN's failure to account for prediction uncertainty as the quantification of uncertainty associated with the ANN parameter, (e.g. weights) is complex and difficult. They proposed a Bayesian training method to assess weight uncertainty in ANN. Cui *et al.* (2014) examined the impact of topographic uncertainty in their rainfall–runoff model (TOPMODEL). The performance of TOPMODEL is influenced by the grid size of the digital elevation model (DEM) that defines the topography. The relationship between DEM resolution and TOPMODEL performance was investigated using fuzzy analysis technique. Different grid sizes of the DEM ranging from 30 m to 200 m were used in TOPMODEL. It revealed that the best results were produced with the 30 m resolution. Uncertainty in streamflow prediction was assessed by Boucher *et al.* (2009) based on ensemble forecasts using stacked neural network. Instead of forecasting a single value (e.g., 1-day lead prediction), they predicted an ensemble of streamflows which was then used to fit a probability density function to assess the confidence interval as well as other measures of forecast uncertainty. The uncertainty associated with prediction of water levels (or discharges) was analyzed by Alvisi & Franchini (2011). They introduced fuzzy numbers to determine the weights and biases of neural networks to estimate prediction uncertainty of water levels and discharges. The comparison of this fuzzy neural network method with Bayesian neural network and the local uncertainty estimation model demonstrated the effectiveness of the proposed method where the uncertainty bands had slightly smaller widths than other data-driven models. Alvisi & Franchini (2012) found better accuracy in forecasting water levels and narrower uncertainty band width compared to Bayesian neural network using grey neural network (GNN). Here the parameters are represented by unknown grey numbers that lie within known upper and lower limits. The grey parameters are searched in such a way that the grey forecasted river stages include at least a preselected percentage of observed river stages.

Many of these data-driven modelling methods including traditional ANN learning algorithms are slow, requiring numerous iterations to generate optimal solutions (Zhang *et al.* 2007; Ding *et al.* 2015), and may not be suitable for real-time prediction. To overcome this, Huang *et al.* (2006) proposed a learning algorithm based on SLFNs, called ‘extreme learning machine’ (ELM) which analytically determines weights related to the output. The performance of ELM was compared with conventional NN and SVM on benchmarking problems in the function approximation and classification areas. Huang *et al.* (2006) found that ELM approximates any continuous function and implements any classification. ELM may need more hidden nodes but learns faster than SVM. The generalization performance of ELM is stable with a wide range of numbers of hidden nodes. Ding *et al.* (2015) stated that ELM, which requires a single iteration, overcomes the slow training speed and over-fitting problems unlike other conventional ANN learning algorithms. ELM's robustness and fast learning rate was proved in different fields including real data set classification and regression (Huang *et al.* 2012). Zhang *et al.* (2007) applied ELM to multi-category classification problems in cancer diagnosis and found that ELM did not have problems like falling in local minima and over-fitting which are commonly experienced by iteration-based learning methods.

This paper presents the application of ELM (a MATLAB program developed by Huang *et al.* 2006) for predicting hydrological flow time-series. Predictions were made for the Tryggevælde Catchment, Denmark and Mississippi River at Vicksburg, USA to demonstrate the application of the scheme.

## EXTREME LEARNING MACHINE (ELM)

ELM is an emerging learning technique that provides efficient unified solutions to generalized feed-forward network with hidden neurons generated randomly. ELM, a biologically inspired neural network, chooses input weights randomly and analytically determines the output weights. Here, the input weights and biases are not tuned and the hidden layer output matrix remains unchanged. This means that ELM hidden node parameters are independent between the hidden layer and the training data. It generates the hidden node parameters without depending on training data.

The advantages of ELM are: (1) faster learning speed than conventional methods; (2) learns without iteration; (3) better generalization performance; (4) automatically determines all the network parameters analytically; (5) suitable for many non-linear activation function and kernel functions; (6) efficient for online and real-time applications; and (7) viable alternative technique for large-scale computing and machine learning.

The mathematical equation for SLFNs can be formularized as (Figure 1) (Huang *et al.* 2006):
1where is the weight vector connecting the *i*^{th} hidden node and the input variables and is the bias of the *i*^{th} hidden node, *L* is random hidden nodes; *β _{i}* is the weight connecting the hidden node and the output node and is an activation function (e.g., sigmoidal function: ; and ).

When the difference between the target and the model is zero for a time-series of *N* samples,
2where *J* = 1, …… *N*.

This means:
3Equation (3) can be written as:
4where *H* is called hidden layer output matrix of SLFNs.

For the fixed input weight and input biases, training of SLFN finds a least squares solution of the above equation.
5where ; ;
*k* is the number of targets.

Equations (4) or (5) is solved using the smallest norm least-squares solution method, where:
6where is called Moore–Penrose generalized inverse of matrix *H.* If the number of hidden neurons is equal to the number of training samples, SLFNs can approximate the training samples with zero error. may be calculated using several methods including orthogonal projection method, orthogonalization method, iterative method, singular value decomposition (SVD), etc. In ELM, the SVD method is used to calculate. It was shown that SLFNs with randomly generated hidden nodes and with a widespread piecewise continuous activation function can universally approximate any continuous target function. Further details are available elsewhere (Huang *et al.* 2006).

## APPLICATION

In this study ELM was applied to estimate the catchment runoff (one-step-ahead prediction) using the past and current information of hydrological flow as input data. Mathematically, the relationship can be expressed as: 7if past historical flow series is considered. 8if past historical flow and flow difference data series are considered. 9if past historical flow difference series is considered.

In Equations (7)–(9), *Q* is the flow (m^{3}/s), is error predictor, , *m* represents how far back the recorded data of the time-series affects the flow prediction and Δt is time interval.

Once the error predictor is determined from the model in Equation (9), the predicted flow is estimated as: 10The performance of trained ELM was evaluated with standard goodness-of-fit measures such as root mean square error (RMSE) and normalized root mean square error (NRMSE).

The RMSE and NRMSE are defined as:
11
12where (*Q _{m}*)

*and (*

_{t}*Q*)

_{o}*are the predicted and observed values at time*

_{t}*t*;

*N*is the number of observations and is the mean observed flow.

RMSE represents the forecasting error and estimates the sample standard deviation of the differences between predicted values and observed values. It is a good measure when large model errors are not desirable. The NRMSE normalizes the RMSE and facilitates comparison between data sets. NRMSE close to zero indicates a perfect match between the observed and predicted values and greater than 1 means predictions are inferior to the constant mean value (Liong *et al.* 2002b).

ELM was used to estimate the 1-day lead prediction of flows for Tryggevælde Catchment and Mississippi River at Vicksburg. The ELM was trained with the same data as used in Liong *et al.* (2002b) and Yu *et al.* (2004) and the results were compared with standard chaos technique, inverse approach, ANN and EC-SVM. In Yu *et al.* (2004), data of 1975–1991 was used for training and 1992–1993 for validation in standard chaos technique. Phoon *et al.* (2002) used 1975–1989 for training, 1990–1991 for testing and 1992–1993 for validation in their application of the inverse approach.

In this study, data of 1975–1991 was used for training and 1992–1993 for validation, similar to the standard chaos technique. The number of hidden nodes in ELM was selected as the number of training samples (6,204). This is because ELM can generate zero error when the number of hidden neurons learns the same number of distinct observations (Huang *et al.* 2006). For the output node, the widely used sigmoid activation function was chosen.

## TRYGGEVÆLDE CATCHMENT

The Tryggevælde Catchment (130.5 km^{2}) is located in the eastern part of Sealand, north of Karise. The daily measured flows are available for the period 1 January 1975 to 31 December 1993. The statistics of the flow series are: mean flow = 0.977 m^{3}/s; standard deviation = 1.367 m^{3}/s; maximum flow = 11.068 m^{3}/s; and minimum flow = 0.014 m^{3}/s. The statistics of flows indicate that there are distinct wet and dry periods in the time-series.

The training and validation accuracies are presented in Table 1 in terms of RMSE and NRMSE for ELMI (lagged flow, *Q*), ELMII (lagged flow difference, *dQ*) and ELMIII (*Q, dQ)*. The validation accuracies from all three ELM models were between 0.491–0.504 (RMSE) and 0.337–0.347 (NRMSE), respectively. The ELMIII performed the best among the three models. The time required to train ELM models was about 100 sec on a Windows-based machine (Intel i7 CPU at 2.67 GHz). The corresponding validation times were between 0.47 and 0.49 sec. An ANN model, which estimates the default number of hidden nodes (81) based on number of input variables, output variables and training samples, was trained with the same data set as ELM (hidden nodes = 6,204) for approximately 100 sec (training time of ELM) for comparison with ELM. The prediction accuracies of ANN obtained from 633 iterations were 0.588 (RMSE) and 0.403 (NRMSE) (Table 2).

Table 2 compares the ELM results with other available techniques (Liong *et al.* 2002b; Yu *et al.* 2004). Yu *et al.* (2004) used two types of input time-series, namely daily flow series (*Q*) and flow difference series (*dQ*) separately in EC-SVM. The use of *dQ-*series in EC-SVM provided better results than the *Q-*series. The number of iterations required was 151,668 for EC-SVM(*Q*) and 11,800 for EC-SVM(*dQ*). All ELM techniques were faster as no additional iteration was required and produced better results (Table 2). The ELMIII model improved the prediction accuracy in terms of RMSE by 24% over the standard chaotic approach, 7% over the inverse approach and 4% over the EC-SVM(*Q*). ELMI performed similarly.

## MISSISSIPPI RIVER AT VICKSBURG FLOW

A similar approach was also applied to predict flows in the Mississippi River at Vicksburg, and used the same daily flows documented in Yu *et al.* (2004). The daily measured flows cover the period 1 January 1975 to 31 December 1993. The statistics of flow series are: mean flow = 18,456.54 m^{3}/s; standard deviation = 9,727.27 m^{3}/s; maximum flow = 52,103.00 m^{3}/s; and minimum flow = 3,907.72 m^{3}/s.

Table 3 shows the results of training and validation obtained from ELMI(*Q*)*,* ELMII(*dQ*) and ELMIII(*Q*, *dQ*). The time required to train ELM models was about 97 sec on the same computer (Windows Intel i7 CPU at 2.67 GHz). The corresponding validation times were between 0.45 and 0.47 sec.

The results (Table 3) showed that ELMIII performed best with RMSE and NRMSE of 308.66 and 0.0391, respectively. These values were slightly higher for the other two ELM models (ELMI and ELMII). ELMIII predicted better results (Table 4) than the standard chaos technique (RMSE = 1,738.95 and NRMSE = 0.2064) and inverse approach (RMSE = 356.89 and NRMSE = 0.0452). An ANN model (default hidden nodes = 81) trained with the same data set as ELM (hidden nodes = 6,204) was run for approximately 97 sec (training time of ELM) for comparison with ELM. ANN prediction accuracies obtained from 794 iterations were 549.70 (RMSE) and 0.0696 (NRMSE) (Table 4).

Table 4 shows that the RMSE and NRMSE of ELM models were slightly higher compared to EC-SVM(*Q*) and EC-SVM(*Q, dQ*) models. However, ELM predicted these solutions quickly (single run) compared to EC-SVM(*Q*) and EC-SVM(*Q*, *dQ*), where 1,732,579 and 57,590 iterations, respectively, were required. This demonstrates that ELM can be efficient for online and real-time applications.

## DISCUSSION

This study demonstrates the application of ELM for predicting hydrological flow time-series. The results show that the prediction accuracies of ELM were similar or better than ANN and other previously published techniques including EC-SVM, standard chaotic approach and inverse approach. More specifically, ELM improved the flow prediction accuracy in terms of RMSE by 24% over standard chaos technique, 7% over the inverse approach and 4% over the EC-SVM for the Tryggevælde Catchment. ELM provided solutions of similar accuracy to EC-SVM when predicting Mississippi River flows.

ELM's real strength is that it reached the solutions quickly compared to other techniques including EC-SVM. This is because no additional iteration is required in ELM whereas other techniques require thousands of iterations to predict the same flow time-series although most do so with less accuracy. Such runs typically have a much longer processing time. Importantly, this processing time will significantly increase for more complex scenarios where many more iterations are required to obtain an optimal solution. This longer processing time may be a limiting factor for real-time application. ELM's fast learning capability from a training data set means that it would be more suitable for online and real-time applications where quick processing is important or vital. These include flood forecasting, the prediction of inflows for reservoir operations, supply of water to meet irrigation demand, real time control of water systems and sewer systems, etc. Furthermore, by having improved or at least comparable prediction accuracy to other available methods, ELM is no less capable for use in water resource management and decision-making.

This paper demonstrates the potential of ELM for predicting hydrological flow time-series. The data obtained from a relatively small (Tryggevælde Catchment, Denmark) and large catchment (the Mississippi River Catchment at Vicksburg) covered both dry and wet periods. In both cases, ELM's performance is comparable for 1-day lead prediction which means that the potential issue of extrapolating flows beyond the range of training data set is unlikely to be significant. The speed, predicting capability and accuracy of ELM can also depend on the number of hidden nodes, activation functions and different sampling of input data. These aspects require more investigation.

## CONCLUSION

This paper presents the application of ELM for predicting hydrological flow time-series for the Tryggevælde Catchment and Mississippi River at Vicksburg. The results show that the prediction accuracies of ELM are similar or better than ANN and other previously published techniques. The real strength of ELM is the short computational run-time to reach solutions comparable with other techniques including EC-SVM. This is because ELM does not require iteration whereas other techniques (e.g., EC-SVM) may require thousands of iterations to predict the same flow time-series and yet with less accuracy. Such runs typically take a much longer processing time. ELM's fast learning capability from a training data set means that it would be more suitable for online and real-time applications where quick processing is important or vital.

- First received 2 February 2015.
- Accepted in revised form 27 August 2015.

- © IWA Publishing 2016

Sign-up for alerts