## Abstract

An artificial neural network (ANN) is a powerful data-driven modeling tool. The selection of the input variable is an important task in the development of an ANN model. However, at present in ANN modeling, the input variables are usually determined by trial and error methods. Further, the ANN modeler usually selects a single ‘good’ result, and accepts it as the final result without detailed explanation of the initial weight parameter. In this way, the ANN model cannot guarantee that the model will produce the optimal result for later predictions. In this study, the ANN ensemble model with exploratory factor analysis (EFA) was developed and applied to three stations in the Nakdong River, Korea for the 1-day ahead streamflow forecasting. EFA was used to select the input variables of the ANN model, and then the ensemble modeling was applied to estimate the performance of the ANN to remove the influence of initial weight parameters on the model results. In the result, the ANN ensemble model with the input variables proposed by EFA produced more accurate and reliable forecasts than other models with several combinations of input variables. Nash–Sutcliffe efficiency (NSE) results in the validation were 0.92, 0.95, and 0.97, respectively.

- artificial neural network
- ensemble modeling
- exploratory factor analysis (EFA)
- Nakdong River
- streamflow forecasting

## INTRODUCTION

An artificial neural network (ANN) is a powerful data-driven modeling tool that can capture the complex and non-linear relationships between inputs and outputs. Thus, an ANN has become a new tool as an efficient model in hydrological forecasts due to the uncertainties of which hydrological processes inherently have. However, in spite of the increasing studies for hydrological forecasts using the ANN model, the modeler often developed the ANNs without considering the determination of appropriate input variables. The selection of input variables for the present ANN modeling is generally based on a priori knowledge of causal variables and statistical analysis of potential inputs and outputs, and is usually determined by trial and error methods by changing the causal variables affecting the output variables (e.g., Maier *et al*. 2010; May *et al*. 2011). Maier & Dandy (2000), Bowden (2003), Maier *et al*. (2010) and May *et al*. (2011) reviewed the input variable selection methods which have been applied to ANN applications. In the reviews, input variable selection methods were classified into two main classes which were model-based and model-free (filtered) methods. Model-based approaches rely on the development of a number of ANN models with different inputs to determine which of the input candidates should be included. The primary disadvantage of this approach is that it is time-consuming, as the results of ANNs with all possible combinations of the input candidates have to be compared. In contrast to model-based approaches, model-free approaches to input selection do not rely on the performance of trained ANN models. Model-free approaches are usually based on bivariate statistics such as correlation and mutual information. This approach can reduce the computational burden and the overall effort of ANN development. However, the problem of this approach is the redundancy of input variables and which makes it difficult to assess the optimality of the input variables obtained since this approach gives the relevance between the input candidates and the desired output, not how well the input candidates represent the information of the desired output. The authors previously mentioned above concluded that no consensus had been reached regarding suitable methods for input variable selection and there was a need for a more considered approach.

Another problem of ANNs is that the ANN model gives different results for the same original inputs depending on the initial weight parameter set before training the neural network. Since the backpropagation (BP) optimization method used in the training of the networks is unstable, a different weight parameter set will be obtained each time the optimization process is launched (Boucher *et al*. 2009). This led to research for the optimal initial weights of neural networks (Kolen & Pollack 1990; Yam & Chow 1995). Several methods involving genetic algorithms have been implemented to find the optimal initial weight parameters and enhance the accuracy of ANN models (Venkatesan *et al*. 2009; Chang *et al*. 2012; Mulia *et al*. 2013). These studies reached a consensus that the optimal initial weight parameters were very sensitive to training algorithms and data structures, and there were no fixed optimal initial weight parameters which are universally applicable to the variety of data structures and training algorithms. Therefore, ensemble techniques have been applied due to the basic fact that selection of the weights is an optimization problem with many local minima (Hansen & Salamon 1990). Laucelli *et al*. (2007) applied ensemble modeling and genetic programing to hydrological forecasts and showed the error due to the variance is effectively eliminated by using an average model (ensemble model) as the resulting model of many runs. Boucher *et al*. (2009) developed the 1-day ahead ensemble ANN model for streamflow forecast. This study showed that the random initialization of the weight parameters mainly accounted for the uncertainty linked to the optimization of the model's parameters and ensemble modeling can reduce the uncertainty using the proper assessment tools for the performance of ensemble models. Zamani *et al*. (2009) developed an ensemble ANN model with the Kalman filter which corrects the outputs of the ANNs to find the best estimate of the wave height, and showed the prediction results were improved as the number of ensemble members increased. These studies indicate the ANN model cannot guarantee that the model will produce an optimal result without considering appropriate methods for the initial weight parameters.

This study attempted to employ exploratory factor analysis (EFA) to propose a useful input variable selection method for the ANN model. The EFA used in this study, which is classified to a model-free approach, can select the input variables required to describe little or no predictive information of the desired output, considering the degree of redundancy. In this study, in addition to EFA application, the ensemble modeling for various initial weight parameter sets was applied in order to estimate the ANN performance without the effect of initial weight parameters on the variance of ANN model results. Figure 1 illustrates the research questions and approaches applied in this study (e.g., Maier *et al*. 2010). To estimate the performance of ANN models with the proposed approaches, the one-step ahead forecasting ANN ensemble model was developed and applied to Nakdong River in Korea.

## MODEL DEVELOPMENT

### ANN model

The ANN model is a computational model inspired by the biological neural network in the human brain. The concept of artificial neurons was first introduced in 1943 (McCulloch & Pitts 1990), and applications of ANNs in research areas began with the introduction of the BP training algorithm for feedforward ANNs in 1986 (Rumelhart *et al*. 1986). The main difference between the various types of ANN model involves the network architecture and the method to determine the weights (Caudill & Butler 1992). The way in which the neurons of the ANN are structured and the neural networks learn determines the architectures of ANN. In general, there are three fundamentally different classes of network architectures based on the way by which the neurons of ANNs are structured (Haykin 1999). The first is a single-layer feedforward network without hidden layers. The second is a multilayer feedforward network with more than one hidden layer. The third is a recurrent neural network with at least more than one feedback loop.

In this study, the multilayer feedforward neural network (MFNN) with one hidden layer was used because it is able to approximate most of the non-linear functions demanded by practice (Mulia *et al*. 2013). The most common and standard training algorithm is the BP training algorithm, the central idea of which is that the error for the neurons of the hidden layer are determined by backpropagating the error of the neurons of the output layer, as shown in Figure 2. There are a number of variations in BP training algorithms on the basic algorithm that are based on other standard optimization techniques, such as steepest descent algorithm, conjugate gradient algorithm, and Newton's method. The steepest descent algorithm is the simplest and often the slowest. The conjugate gradient algorithm and Newton's method generally provide faster convergence. Especially, the Levenberg–Marquardt (LM) algorithm has been very successfully applied to hydrological forecasts, providing significant speedup and faster convergence than the steepest descent algorithm and conjugate gradient-based algorithms (Zamani *et al*. 2009). In this study, the LM algorithm was applied for training the network.

The LM algorithm is a variation of Newton's method designed to minimize functions that are sums of squares of other non-linear functions. This is very well suited to neural networks training, where the performance function is the sum of squared error. Newton's method is based on the second-order Taylor series expansion, while the derivation of the steepest descent algorithm was based on the first-order Taylor series expansion (Hagan *et al.* 1996). The LM algorithm is given in Equation (1)
1where *w* is the weight vector and *v* is the error vector. *J* is the Jacobian matrix. *I* is the identity matrix. is the damping factor. Subscript *k* is an epoch or iteration number.

This algorithm has the very useful feature that as increases, it approaches the steepest descent algorithm with a small learning rate.
2
3where is a performance function (the sum of squared error), *N* is the number of the data set.

As decreases to zero, the algorithm becomes Gauss–Newton. The algorithm begins with set to some small value (e.g., ). If a step does not yield a smaller value for , then the step is repeated with multiplied by some factor (e.g., ). Eventually should decrease, since we would be taking a small step in the direction of steepest descent. If a step does produce a smaller value for , then is divided by for the next step, so that the algorithm will approach Gauss–Newton, which should provide faster convergence. The algorithm provides a nice compromise between the speed of Newton's method and the guaranteed convergence of steepest descent (Hagan *et al.* 1996).

### EFA

The EFA is one of the factor analysis techniques whose goal is to identify the underlying relationships between measured variables (Norris & Lecavalier 2010). It is generally used when the researcher has no a priori hypothesis about factors or patterns of measured variables (Finch & West 1997). EFA has been applied to research in the field of psychology, economics (business and marketing), medicine, and biology (Fabrigar *et al*. 1999; Peterson 2000; Kelton *et al*. 2010; Lucas *et al*. 2010). EFA in water resource management was applied to interpret the relationships between the observed data set and variables from monitoring stations at various locations in rivers. Liu *et al*. (2003) attempted to apply the EFA to groundwater samples. The results showed that two factors affecting the seawater salinization and arsenic pollution were defined, and describe the main hydrochemical processes and identify the possible cause of groundwater salinization and arsenic pollution. Boyacioglu (2006) applied EFA to analyze the surface water quality data and identified the practical pollution indicators to delineate the domestic and agricultural pollution in the basin.

EFA assumes that the variance in the observed variables is due to the presence of one or more latent variables (common factors) that exert causal influence on these observed variables, and distinguishes the contributions of individual variables to the variance of output variables from the interpretation of the latent factors in each variable. A large number of input variables (redundant input variables) might include irrelevant and noisy variables, and lead to increasing the modeling error. Thus, the input variable set will contain the fewest input variables required to describe the variance of the output variable, with a minimum degree of redundancy (May *et al*. 2011). From this point of view, the EFA is very well suited to determine input variables of the ANN model that significantly affect the output variables.

The common EFA model is defined as
4where the observations with *p* variables are and is the common factor matrix for the number of factor *m*. is the factor loading matrix. is the specific factor or residual errors. From this model, the factor loading value *L* for common factors *F* that affect the original variables *X* can be obtained with a factoring method. The procedure of EFA consists of five steps, as shown in Figure 3. The first is standardizing the row data. The second is making the covariance matrix. The third is calculating the eigenvalues and eigenvectors and selecting the number of common factors. The fourth is calculating the factor loading value of the selected common factors. The last is interpretation of the common factors. In this study, the principal component factoring method was used for calculating the factor loading value. It is the most widely used and basic factoring method in EFA. This factoring method has similarities to principal component analysis (PCA). Thus, EFA is sometimes confused with PCA. However, EFA and PCA are very different. EFA assumes that the covariance in the observed variables is due to the presence of one or more latent variables (common factors) that exert causal influence on these observed variables. Therefore, it enables the contributions of individual variables to the variance of the desired variables by the interpretation of common factors to be distinguished. In contrast, PCA makes no assumption about an underlying causal model. PCA is simply a variable reduction analysis that typically results in a relatively small number of principal components (PCs) which are the linear compound of the variables instead of the direct use of variables. Therefore, it is not possible to distinguish the contributions of individual variables to the variance of the desired variables.

The common EFA model Equation (4) is based on two assumptions. First, the error terms are independent of one another, and such that and , where E denotes the expected value (mean value) of the variable for all observed data set and is the specific variance. Second, common factors are independent of one another and of the error terms, and are such that and . From these assumptions, the covariance matrix in EFA can be defined as 5where .

A vector of observations with *p* variables is
6The covariance matrix ∑ is expressed as
7
8

The eigenvalues and corresponding eigenvectors can be obtained from Equation (8). In PCA, the covariance matrix can be expressed as the sum of *p* eigenvalues multiplied by their eigenvectors and their transposes. The idea behind the principal component factoring method is approximating this expression.
9

This result yields the following estimator for the *ij*th factor loading value, which indicates the effect of the *j*th factor on the *i*th variables.
10where is an *i*th eigenvector and is a *j*th component of the *i*th eigenvector.

Equation (4) can be rewritten as 11

In the case , the residual error terms become zero and in the case , the terms of the right-hand side for become the residual error term . From Equation (11) and the assumption of the EFA model, the variance of the original variable can be defined as 12

The first term, communality of the variables, is the part that is explained by the common factor and means how well the common factors explain the total variance of each variable, and is calculated as the square of each factor loading value. The second term, the specific variance, is the part of the variance that is not accounted for by the common factors. As communality is close to 1, it means almost all variances of the variable are explained by the common factors.

### Ensemble modeling

Ensemble techniques have been applied in hydrology and environmental science as an approach to enhance the skill of forecasts with considerable success (Maqsood *et al.* 2004; Jeong & Kim 2005; Araghinejad *et al*. 2011). The motivation for this procedure is based on the idea that by combining the outputs of several individual predictors one might improve on the performance of a single generic one (Wolpert 1992; Krogh & Vedelsby 1995). The technique of combining multiple models into a single one is referred to as ensemble modeling (Laucelli *et al*. 2007). The most common practice of ensemble modeling in ANN is to average the ensemble members to obtain a forecast result (Hansen & Salamon 1990; Zhou *et al*. 2002; Zhang 2007). This averaged output of many neural networks trained on the same data has less variability than the outcome of a single network (Boucher *et al*. 2009). The application of an ensemble technique is divided into two steps. The first step is to create individual ensemble members, and the second step is the appropriate combination of outputs of the ensemble members to produce the most optimal output (Araghinejad *et al*. 2011).

In this study, the ensemble modeling technique was applied to estimate the performance of the ANN without the influence of initial weight parameters on the model results. For the first step, neural networks with 100 randomly generated initial weight sets for the same structure were used in parallel for the same original training data set. For the second step, Nash–Sutcliffe efficiency (NSE), which is given in Equation (13), was used. NSE is a normalized statistic that determines the relative magnitude of the residual variance (‘noise’) compared to the measured data variance (‘information’) (Nash & Sutcliffe 1970). NSE indicates how well a mean result of the ANN ensemble model fitted a set of test or validation output data.
13
14
15where is the total sum of squared, and is the sum of squares of residuals. is an *i*th observed value or target value. is a mean value of for all observed data set. is a mean result of an ANN ensemble model for the *i*th data set.

In addition to NSE, the root-mean-square error (RMSE) and inter-quartile range (IQR, distance between 25th and 75th percentile) were used for a global estimate of the results of ensemble modeling. RMSE was used to measure the whole bias error between mean results of the ANN ensemble model and observed values.
16where *n* is the number of observed data set. IQR, shown in Figure 4, was used to measure variance error of the ANN ensemble model itself.
17where and are 25th and 75th percentile value of ANN ensemble model results for the *i*th data set, respectively.

The effect of initial weight parameters was removed through the aggregation of the results of individual ensemble members. Figure 5 illustrates the ANN ensemble modeling technique used in this study.

## MODEL APPLICATION

### Study site and input data

The Nakdong River, one of the four major rivers in South Korea, is located in the southeastern part of the Korean peninsula. The river is about 525 km long and has a drainage area of 23,817 km^{2}. Busan, one of the biggest cities in South Korea, is located near the river mouth. About 7 million people reside within the basin, and more than 13 million people intake drinking water from this river. Moreover, river flow and environmental characteristics have been changed drastically after the Four Major River Restoration Project, constructed from 2010 to 2012. Thus, management of water uses and the flood control in this area has been a major national project.

The daily streamflow data from 23 stations and the daily rainfall data from 12 rainfall stations in the Nakdong River basin for the years 2003–2009 were used in this study to develop the 1-day ahead streamflow forecasting ANN model. The locations of the streamflow and rainfall stations in the Nakdong River basin are shown in Figure 6. Rainfall (*P*) and streamflow (*Q*) are essential input variables for the streamflow forecasting ANN model (Minns & Hall 1996; Campolo *et al*. 1999; Anctil & Rat 2005). The time range of rainfall and streamflow inputs should ideally be adjusted according to the time of concentration or the traveling time of each predicted station in the basin (Anctil & Rat 2005). Considering the critical rainfall duration 40 hours for a 200-year design flood of the Nakdong River (MOLIT 2004) and the preceding rainfall, input variables of daily rainfall and streamflow were discretized into input variable candidates with time *t*(day), *t* − 1, *t* − 2 for the rainfall and the streamflow and the desired output time *t* + 1 for the streamflow. Thus, in this study, the desired output *Q _{t+1}* of each stream gaging station has the input variable candidates

*Q*,

_{t}*Q*,

_{t−1}*Q*,

_{t−2}*P*,

_{t}*P*, and

_{t−1}*P*. One hundred and twenty-eight variables (105 input variable candidates and 23 desired output) are shown in Table 1. Symbols instead of variable names were used for convenience in this study.

_{t−2}### Results of EFA

One hundred and twenty-eight inputs with a 2,554 data set (the daily data from 2003 to 2009) were analyzed by EFA to define what factors affect the desired output. The number of common factors was determined by the number of eigenvalues of the covariance matrix that exceeded 1. This criterion was proposed by Kaiser (1958) and is widely used. The number of common factors was selected as 15, and the cumulated proportion of 15 eigenvalues accounts for about 87% of the total variance of the inputs. The principal component factoring method was used for the factoring method. From the results of Table 2, each common factor can be categorized into three groups and interpreted as follows. The first group is Factor 1 to Factor 4, which gives information for the time range of inputs. The second group is Factor 5 and Factor 6, which gives information for whether a stream gaging station is more affected by the upstream or downstream part of the basin, such that application of rainfall inputs *P** for upstream and *P* of all rainfall stations for downstream can be set. The third group is Factor 7 to Factor 15, which gives information for variables affected by specific stations. From the third group of factors, *Q* or *P* inputs of specific stations can be added to the modeling for a desired output.

### Determination of input variables

In this study, three stream gaging stations located in the upstream, middle, and downstream part of the Nakdong River were selected to verify the streamflow forecasting ANN modeling result for the input variables selected by EFA. These are Nakdong, Koryunggyo, and Susan stream gaging stations located in the major metropolitan area. Table 3 shows the EFA result for the desired output *Q _{t+1}* of Nakdong (

*v5*), Koryunggyo (

*v13*), and Susan (

*v19*) stream gaging stations. The total communality result shows that 94.7% variance of the variable

*v5*, 96.2% variance of the variable

*v13*, and 94.8% variance of the variable

*v19*can be described by 15 common factors.

Considering the reduction of input variables and redundancy, the input variables for *v5*, *v13*, and *v19* were determined by the factors of which accumulated communality exceeds 90% in Table 3. The variance of *v5* can be explained more than 90% by Factor 5 in the second group and Factor 4, Factor 2 and Factor 1 in the first group. From the interpreted factor results, we can guess *v5* was affected strongly by the upstream part of the basin, because Factor 5 is bigger than Factor 6. The time step of variables was affected by Factor 4, Factor 2, and Factor 1. The meaningful variables for *v5* can be selected as *Q _{t}*,

*Q*,

_{t−1}*Q*,

_{t−2}*P**

*t*, and

*P**

*. The variance of*

_{t−1}*v13*can be explained more than 90% by Factor 5, Factor 6 in the second group, Factor 2, Factor 4, and Factor 1 in the first group, and Factor 8 in the third group. As Factor 5 is bigger than Factor 6, we can guess

*v13*was affected more by the upstream part of the basin than by the downstream part of the basin. The time step of variables was affected by Factor 4, Factor 2, and Factor 1. From Factor 8, a specific station upstream affected

*v13*. The meaningful variables can be selected as

*Q*,

_{t}*Q*,

_{t−1}*Q*,

_{t−2}*P**

*t*,

*P**

*, and*

_{t−1}*Q**

*of a specific station, which is selected as the Wegwan stream gaging station, since it was the closest upstream stream gaging station to Koryunggyo station among the gaging stations in*

_{t}*Q**. The variance of

*v19*can be explained more than 90% by Factor 6, Factor 5 in the second group, Factor 1, Factor 2, Factor 4, and Factor 3 in the first group. Since Factor 6 is bigger than Factor 5, we can guess that the

*v19*was located in the downstream part of the basin. The time step of variables is affected by all factors in the first group. Therefore, the meaningful variables for

*v19*would be

*Q*,

_{t}*Q*,

_{t−1}*Q*,

_{t−2}*P*,

_{t}*P*, and

_{t−1}*P*.

_{t−2}## RESULT OF STREAMFLOW FORECASTING

To verify that the input variables selected by the EFA give a good streamflow forecasting ANN modeling result, several cases consisting of various combinations of input variables selected without application of the EFA were compared to the case of proposed input variables by EFA. In the ANN modeling, SMFNNs with one hidden layer were used, and the number of hidden neurons was set by less than or several times the number of input variables (input neurons). The LM algorithm was used to train the network with 100 randomly generated weight parameter sets to reach the required training goal of 0.001 in 500 epochs. The tangent sigmoid activation function for the hidden layer and a linear activation function for the output layer were used. Among the 2,554 data sets, 1,824 data sets in 2003, 2004, 2007, 2008, and 2009 were used to train the network, and 365 data sets in 2005 were used to test what number of hidden neurons gives a good result; 365 data sets in 2006 were used to validate the neural network selected from the test.

### Simple autoregressive model

Simple autoregressive (AR) models with antecedents of each variables in Equation (18), were also developed for training data set: 18where is a desired output, , , and are the antecedents of the desired output, and is a constant or noise.

AR models for each desired output are shown in Table 4. In the result, AR(3) model using *Q _{t}*,

*Q*,

_{t−1}*Q*shows better results than AR(1) and AR(2) models. Developed AR models using training data were applied to test and validation data set to compare the result with ANN ensemble models.

_{t−2}### Q_{t}_{+1} of Nakdong stream gaging station (*v*5)

_{t}

Several cases of various combinations of input variables affected by Factor 5, Factor 4, Factor 2, and Factor 1 that affected *v5* are compared. Case 1 is the neural network model with input variables affected by Factor 5 and Factor 4. Case 2 is the model with input variables affected by Factor 5, Factor 4, and Factor 2. Case 3 is the proposed model by EFA in which the input variables are affected by Factor 5, Factor 4, Factor 2, and Factor 1. Case 4 is the model in which the input variable affected by Factor 3 is added to the input variables of Case 3. Case 5 and Case 6 are the models with/without *Q _{t−1}*,

*Q*, and

_{t−2}*P**

*of Case 3. The test and validation results of various numbers of hidden neurons for each case and AR models are shown in Table 5.*

_{t−2}Each neural network was trained into RMSE, 65.7 m^{3}/s, IQR 29.3 m^{3}/s, and over 0.96 NSE on average. According to the test result, the optimum number of hidden neurons for each model was selected as 14 hidden neurons for Case 1, 28 hidden neurons for Case 2, 10 hidden neurons for Case 3, 21 hidden neurons for Case 4, 26 hidden neurons for Case 5, and 10 hidden neurons for Case 6. Validation results of Case 1, Case 3, and Case 6 also showed good results in the model with the optimum number of hidden neurons of each case in the test. However, Case 2, Case 4, and Case 5 showed good validation results in the model with different numbers of hidden neurons with an optimum number of hidden neurons selected in the test. This means that these models may give a different result according to the input data presented, so these models can be regarded as unstable. As the variables affected by factors with high communality are added to the input variables, RMSE in the test and the validation result decrease. Case 4 is the model in which *P _{t−2}* is added to the input variables of Case 3. As shown in Table 3, the 2% of communality is added by

*P*which is the variable affected by Factor 3 and leads to the lowest RMSE in the test result and generally lower RMSE than other cases in the validation result. However, Case 4 shows higher RMSE than Case 3, which is thought to be caused by the data redundancy 2% of necessary information was given by Factor 3 but, more unnecessary information was included. As shown in Figure 7, Case 3 shows relatively higher RMSE 115.3 m

_{t−2}^{3}/s, IQR 77.6 m

^{3}/s than Case 2, Case 4, and Case 6 in the test result, but gives the lowest RMSE 114.1 m

^{3}/s and the highest NSE 0.92 in the validation result. And Case 3 shows better results both in the test and validation than AR models. Therefore, the Case 3 neural network with proposed input variables from EFA can be selected as a reliable neural network model for

*v5*. Figure 8 shows the validation result of Case 3.

### Q_{t}_{+1} of Koryunggyo stream gaging station (*v*13)

_{t}

Several cases of various combinations of input variables affected by Factor 5, Factor 2, Factor 4, Factor 8, Factor 1, and Factor 6 that affect *v13* are compared. Case 1 is the model with input variables affected by Factor 5, Factor 2, Factor 4, and Factor 6. Case 2 is the model with input variables affected by Factor 5, Factor 2, Factor 4, Factor 1, and Factor 6. Case 3 is the proposed model by EFA in which the input variables are affected by Factor 5, Factor 4, Factor 2, Factor 1, Factor 8, and Factor 6. Case 4 is the model in which the input variable affected by Factor 3 is added to the input variables of Case 3. Case 5 and Case 6 are the models with/without *Q _{t−1}*,

*Q*, and

_{t−2}*P**

*of Case 3. The test and validation results of various numbers of hidden neurons for each case and AR models are shown in Table 6.*

_{t−1}Each neural network was trained into RMSE 80.8 m^{3}/s, IQR 73.0 m^{3}/s, and over 0.98 NSE on average. According to the test result, the optimum number of hidden neurons for each case was selected as 14 hidden neurons for Case 1, 15 hidden neurons for Case 2, 10 hidden neurons for Case 3, 10 hidden neurons for Case 4, nine hidden neurons for Case 5, and 10 hidden neurons for Case 6. Case 1, Case 2, and Case 5 showed good validation results in different numbers of hidden neurons with the optimum number of hidden neurons selected in the test. So these models can be regarded as unstable. As the variables affected by factors with high communality in Table 3 are added to input variables, RMSE in the test and the validation result decrease. Case 3 is the model in which *Q _{t}* of Wegwan is added to the input variables of Case 2. In Table 3, the 9% of communality is added by

*Q*of Wegwan affected by Factor 8. This leads to lower RMSE than Case 2 in the test and validation results. Case 4 is the model in which

_{t}*P*is added to input variables of Case 2. In Table 3, the 4% of communality is added by Factor 3. As shown in Figure 9, this leads to lower RMSE than Case 2 in the test and validation results. Case 3 and Case 4 show similar and lowest RMSE results. However, Case 3 is a more efficient model than Case 4, because Case 3 has five fewer input variables than Case 4. Therefore, the Case 3 neural network with input variables proposed by EFA can be selected as an appropriate neural network model for

_{t−2}*v*13. And Case 3 shows better results both in the test and validation than AR models. Figure 10 shows the validation result of Case 3.

### Q_{t}_{+1} of Susan stream gaging station (*v*19)

_{t}

Several cases of various combinations of input variables affected by Factor 6, Factor 1, Factor 5, Factor 2, Factor 4, and Factor 3 that affect *v19* are compared. Case 1 is the model with input variables affected by Factor 6, Factor 1, and Factor 5. Case 2 is the model with input variables affected by Factor 6, Factor 1, Factor 5, Factor 2, and Factor 4. Case 3 is the model proposed by EFA in which the input variables are affected by Factor 6, Factor 1, Factor 5, Factor 2, Factor 4, and Factor 3. Case 4 is the model in which the input variable *Q _{t}* of Wegwan affected by Factor 8 is added to the input variables of Case 3. Case 5, Case 6, and Case 7 are the models without

*Q*,

_{t−1}*Q*,

_{t−2}*P**

*, and*

_{t−1}*P**

*of Case 3. The test and validation results of various numbers of hidden neurons for each case and AR models are shown in Table 7.*

_{t−2}Each neural network was trained into RMSE 145.8 m^{3}/s, IQR 402.3 m^{3}/s and over 0.95 NSE on average. According to the test result, the optimum number of hidden neurons for each case was selected as 15 hidden neurons for Case 1, 27 hidden neurons for Case 2, 10 hidden neurons for Case 3, 10 hidden neurons for Case 4, 13 hidden neurons for Case 5, 10 hidden neurons for Case 6, and 20 hidden neurons for Case 7. Validation results of Case 2, Case 6, and Case 7 showed good results in the model with the different numbers of hidden neurons of the test. As the variables affected by factors with high communality in Table 3 are added to the input variables, RMSE of Case 1–Case 4 in the test result decreases. Case 4 is the case in which *Q _{t}* of Wegwan affected by Factor 8 is added to input variables of Case 3. In Table 3, the 4% of communality is added by the variable affected by Factor 8. As shown in Figure 11, this leads to lower RMSE 102.5 m

^{3}/s than RMSE 116.9 m

^{3}/s of Case 3 in the test result. However, Case 4 shows higher RMSE 236.0 m

^{3}/s than RMSE 218.7 m

^{3}/s of Case 3 in validation results. Therefore, the Case 3 neural network with proposed input variables by the EFA can be selected as an appropriate neural network model for

*v19*. And Case 3 also shows better results both in the test and validation than AR models. Figure 12 shows the validation result of Case 3.

## DISCUSSION

The ANN model could give very different results according to the initial weight parameters. This means that the optimal model can be changed according to the initial weight parameters. Based on the ensemble modeling, the global performance of the ANN model can be estimated, and the generalization ability of ANNs was improved by combining individual ANNs (ensemble members). Figure 13 and Table 8 show the comparison of validation results between many individual ANN models and ensemble ANN models with a 100 ensemble members. ANN ensemble models show a better NSE result than averaged individual ANN models. An individual ANN model shows NSE result of Min. −29.22 and Median 0.46 for *v5*, Min. 0.74 and Median 0.91 for *v*13 and Min. 0.63 and Median 0.93 for *v*19. All ensemble models show better results (0.92 for *v*5, 0.95 for *v*13, 0.97 for *v*19) than the averaged NSE of individual ANN models.

IQR is the indicator of the variance or uncertainty of ANN ensemble model result according to the initial weight parameters. The result of ensemble modeling showed that the IQR increased as the number of hidden neurons was increasing, and all cases gave good RMSE and NSE results, when the ratio between the number of hidden neurons and input neurons (input variable) was less than or equal to 1. As the number of hidden neurons is increasing, the number of local optima of weight parameters in the network is increasing. This means that the variance of ANN model results for the initial weight parameters can be increased more by the increase of the number of hidden neurons, which led to the increase of the uncertainty of ANN model results. Figure 14 shows the NSE results of each optimal ANN ensemble model for the ratio between the number of hidden neurons and input neurons.

The optimal ANN ensemble models showed big errors in the forecast for the high discharge flow period shown in Figure 15. The RMSE in the high discharge flow period is 334.22 m^{3}/s for *v5*, 467.35 m^{3}/s for *v13*, and 693.48 m^{3}/s for *v19*, which is about three times bigger than each total RMSE 114.09 m^{3}/s for *v3*, 143.27 m^{3}/s for *v13*, and 218.67 m^{3}/s for *v19*. The error in the high discharge flow period is thought to come from the unbalance of the training data set itself. The neural network is trained by the patterns of the given historical data set to minimize the sum of squared errors. Naturally, the network is optimized by the data in the range of low discharge in which a large amount of data is distributed, since the amount of data in the range of high discharge is much less than the amount of data in low discharge.

## CONCLUSIONS

In this study, the ANN ensemble models using EFA for input variables selection and simple AR models were developed to forecast streamflow of three stream gaging stations in Nakdong River, Korea. EFA was performed on 128 variable candidates (105 input variable candidates and 23 desired output) of the 23 streamflow stations and 12 rainfall stations in the Nakdong River basin. In the EFA result, the communality result by 15 factors had an average 87% for the total variance of the 128 variables. The communality result for the three stream gaging stations of Nakdong (*v5*), Koryunggyo (*v13*), and Susan (*v19*) showed 94.7%, 96.2%, and 94.8%, respectively. From the interpretation of the common factors, input variables of the ANN model to forecast streamflow of the three stations were selected as the variables affected by factors of which cumulative communality was more than 90% of each total communality of *v5*, *v13*, and *v19*. *Q _{t}* ,

*Q*,

_{t−1}*Q*of Nakdong and

_{t−2}*P**

*,*

_{t}*P**

*for*

_{t−1}*v5*,

*Q*,

_{t}*Q*,

_{t−1}*Q*of Koryunggyo,

_{t−2}*Q*of Wegwan and

_{t}*P**

*,*

_{t}*P**

*for*

_{t−1}*v13*,

*Q*,

_{t}*Q*,

_{t−1}*Q*of Susan and

_{t−2}*P*,

_{t}*P*,

_{t−1}*P*of all rainfall stations for

_{t−2}*v19*were selected.

In the streamflow forecasting results for *v5*, *v13*, and *v19*, RMSE, IQR, and NSE results of the ANN ensemble models with the input variables from EFA for *v5*, *v13*, and *v19* in the validation were (114.09 m^{3}/s, 106.50 m^{3}/s, 0.92), (143.27 m^{3}/s, 97.15 m^{3}/s, 0.95) and (218.67 m^{3}/s, 162.33 m^{3}/s, 0.97), respectively. And the ANN ensemble models show better result than the simple AR models. When the variables affected by factors of which the cumulative communality was over 90% were added to the input variables, the test result was better than the model with the proposed input variables but the validation result was not better or similar. Considering the robustness and efficiency of the model, the model with the variables affected by factors for which the cumulative communality was more than 90% of the total communality could be selected as an appropriate neural network model.

This study presents an application of ANN with the ensemble modeling technique and the input variables selection technique by EFA. Using the ensembles modeling technique, the global performance of the ANN model considering the variance of ANN results for various initial weight parameters was estimated, and the optimal ANN forecasting model for streamflow of each station can be selected. Through the input variable selection technique by EFA, the ANN model can be more accurate, efficient, cost-effective, and interpretable. EFA enables the selection of the input variables required to describe the variance of the output variable, considering the degree of redundancy and variables that have little or no predictive information. Also, this enables the development of an efficient, cost-effective ANN model, since the process of the input variable selection by EFA is model-free and multiobjective.

However, a relatively high error occurred in the high discharge flow period due to the unbalanced distribution of the training data set. Generally, the distribution of flow data is unbalanced. Thus, future applications of the proposed ANN ensemble model with EFA for streamflow forecasting will include the data preprocessing technique, such as the clustering method, to construct the balanced training data set.

## ACKNOWLEDGEMENTS

This research was supported by a grant (11-TI-C06) from the Advanced Water Management Research Program funded by the Ministry of Land, Infrastructure and Transport of the Korean government, and conducted at the Engineering Research Institute and the Integrated Research Institute of Construction and Environment in Seoul National University, Seoul, Korea.

- First received 26 February 2014.
- Accepted in revised form 8 January 2015.

- © IWA Publishing 2015

Sign-up for alerts