Modeling ensemble streamflow: Application to the Senegal River upper the Manantali Dam

This study presents an ensemble stream flow simulation with probability of occurrence accounting for errors of the GR4J turned to the Senegal River upper the Manantali dam. Through this approach, probability is associated to simulation at each time, with step and reliability of issues depending on probability scores. Results including the reliability of the results permit decision-makers to judge the reliability of their simulations. Past errors pattern of the model is used to perturb the model for issuing ensemble scenarios rather than classical single deterministic. Basic and reverse Box-Cox transformation is carried out to allow treatment of the errors through weighted multivariate Gaussian distribution. Statistic errors are then used to perturb the hydrological model to obtain ensemble issues (RAW-Ens). Furthermore, a post-processing method (affine kernel dressing) is performed to produce a dressed ensemble (AKD-Ens). Ensembles are evaluated at deterministic and probabilistic scale. Diagrams (attribute and ROC) are also used for this purpose. Evaluating methods reveal through scores that the system is reliable and that dressing method (AKD) improves quality of the raw ensemble drawn from the perturbed model.


INTRODUCTION
Modeling and forecasting ensemble of hydrological variables is used to help assessment of feasibility of some applications in term of water supply (drinking water, irrigation and hydropower), particularly in dry areas. However, randomness of natural fluids dynamics has motivated the use of probabilistic approaches in hydrology to improve quality of modeling and forecast issues, and consequently, decision making (Bauer Gottwein et al., 2014;Demeritt et al., 2013). Probabilistic modeling based on issuing ensemble scenarios rather than single deterministic are introduced for the assessment of the flow rates particularly in forecasting purpose, estimating soil moisture, groundwater or tributaries from snow layers (Pappenberger et al., 2015). It enables surpassing classical limitations of deterministic processes and has significantly enhanced flood *Corresponding author. E-mail: dindionemaria@gmail.com.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License prediction through the world (Demargne et al., 2013;Demeritt et al., 2013;He et al., 2016). Moreover, it has led to optimizing gains on activities depending on water resources and reducing common incidents inherent to floods or other hydro climatic extreme events by forestalling potential risks (Fan et al., 2015;Laiolo et al., 2014).
Indeed, since last decades, developing good hydrological forecasting system considering biases has become a challenge for hydrologists and water managers due to frequent failure of classical forecasting schemes (Ranzi et al., 2009;Pagano et al., 2013). Major public and private hydro meteorological services are moving towards implementing ensemble models and using probabilistic interpretations of modeling issues. This is the case of the National Weather Service (NWS) and the National Oceanic and Atmospheric Administration of the United States of America (NOAA) (Demargne et al., 2013;). Towards the same goal, the HEPEX (Hydrologic Ensemble Prediction Experiment) promotes developing and use of ensemble prediction systems through projects mainly focused on hydrological ensemble forecast (HEF) generally run by meteorological ensemble. Jeong et al. (2005) used both single neural network (SNN) and ensemble neural network (ENN) to improve streamflow forecasting system based on the TANK model at the Daecheong multipurpose dam in Korea. Results show that the ENN improves considerably ensemble forecasts. Pappenberger et al. (2011) explain that the European flood alert system (EFAS) provides National Hydrological services with long-term probabilistic flood forecasting via an ensemble prediction system. Beyond efficiency of the probabilistic forecast, they show that ensemble forecast depends on geographical location and catchment spread and that the skill of the forecasted river discharges increases over 10 years forecasting. Pagano et al. (2013) dressed ensemble forecasts of 120 catchment using distribution of the model errors; they found that dressing methods can enhance reliability of raw ensembles. The COSMO-7 deterministic model and the probabilistic COSMO-LEPS have been coupled to the PREVAH hydrological model to help managers of the Sihl River in decision making (Addor et al., 2011). This paper aims at simulating ensemble flow discharge of Senegal River upper the Manantali dam using the GR4J (Rural Engineering with four parameters issuing daily simulation) model in order to integrate commonly deterministic bias in further simulations (Makhlouf, 1994;Caseri, 2017) by issuing probabilistic results. This approach allows streamlining results from classical hydrological modeling by considering the model uncertainty (Davilio et al., 2013;Seo et al., 2010;Pagano et al., 2013;Hemri et al., 2015). The ensemble stream flow is obtained by perturbing the input model with errors pattern cyclically designed on the basis of yearly pseudo periodic behavior of flow discharge at the Senegal River.
Biases are used in designing the perturbation scheme from 14 running years of the model calibrated through two years. The members of the error scheme are then statistically enhanced using a multivariate Gaussian kernel without loss of original characteristics (Sharma et al., 2002) using forward and reverse Box-Cox. Then, the new statistical error pattern is used as perturbation for GR4J model already calibrated to produce ensemble simulation for the period ranging from 1965 to 1976. The corresponding ensemble in this paper is referred to as RAW-Ens. Indeed, multiple scenarios from models enable quantifying uncertainties in the forecasting processes through assessment of probabilities of categorized members of the ensemble (Sivillo et al., 1997;Keune et al., 2014;Krzysztofowicz, 2001). Data used for this study are from the OMVS (Organization of the Development of the Senegal River) database and are composed of daily flows gauges between 1963 and 1976 at the Bafing flow gauge, precipitation and evapotranspiration of corresponding period of study. In addition, affine kernel dressing method (Jha et al., 2015;Lucatero et al., 2017;Silverman, 1998;Broecker., 2008) is used to dress the raw ensemble leading one other ensemble referred to as AKD-Ens. Goodness of the modeling schemes and their results are assessed on one hand through commonly used statistical criterions such the mean error, mean absolute error and the root mean squared error of ensembles; and on the other hand, through probability scores such as the Brier score, the rank probability score, the continuous rank probability. In addition, diagrams such as the attribute diagram and the relative operating diagram (ROC) are used as verification tool. Attribute diagram is used to check for reliability, resolution and skill of modeling systems. The ROC curve is exploited to investigate how hits vary against false alarm rates for each ensemble modeling scheme issuing the probabilistic simulations.

Study area
The Bafing is the main tributary of Senegal River, that is, the second longer river of West Africa with catchment area of about 343 000 km 2 (Michel, 1973). The Bafing area is estimated at about 38000 km 2 and reach both Malian and Guinean areas (Sane et al., 2017) with its length estimated at about 670 km. In addition, it provides about 60% of the Senegal River inflow which slope estimated at about 5 m/km (Maïga et al., 1995). The main source of the Bafing catchment is the mountains of Fouta Djallon in Guinea ( Figure 1). The multifunctional (hydropower, low flow support for irrigation and navigation and flood mitigation) dam of Manantali was set up in the Malian part of the Bafing catchment and provides about 12% of electric energy consumption in Senegal (Bader and Albergel, 2015). Local climate is composed of two types: sub-Guinean in Southern of the Bafing and Sudanese in Northern part (Sane et al., 2017). The data used in this study are from the OMVS (

GR4J model and raw error pattern
GR4J model is a global conceptual rainfall-runoff hydrological model set up by the French Research Center namely the CEMAGREF. It is briefly described below without detailing its operating mode. The GR4J (Rural Engineering with four parameters issuing daily simulation) model involves four parameters: the maximum capacity of production store referred to as X1 (mm), the groundwater exchange coefficient noticed by X2 (mm), the maximum capacity of routing store by X3 (mm) and the time base of unit hydrograph X4 (mm). Inputs model daily are constituted of observed flows discharges, rainfall and daily potential evapotranspiration (Traore et al., 2010;Perrin et al., 2003;Valéry et al., 2014). An error pattern is drawn to perturbation of the GR4J model to issuing ensemble simulations. Thus, the GR4J has been calibrated to set parameters, and then runs over 12 in addition to the two-calibrating period at daily time step. After run, cyclical biases of which period is 365 days are settled to build the raw error pattern. Error pattern used in the ensemble generation are inferred from years daily simulations noticed by ̂ & j . Through the running years of the model already calibrated, overall obtained data will be 365 (Wilks, 2006). Let ̂ be a matrix constituted of rows representing daily simulations and columns that represents yearly simulation time series.
Furthermore, ̂ represents a row vector containing simulated flow discharge of same day through the entire period of study and ̂ represents a column vector containing 365 simulation of a given year. In this study, simulated flows are settled so that rows represent the years and the columns define the dates of the corresponding years. So, they can be written as: (1) Simulations in overall years are written in matrix form such as ̂ which rows correspond to years and columns to the calendar dates. Data are arranged to have good scheme for further statistical error generation. Finally, overall flow discharges from the GR4J through years are drawn in matrix form: . For more understanding of the process, daily flow discharges are schemed as (Wilks, 2006;Vincendon, 2011;Wu 2011;Fard and Hajiaghaei-Keshteli (2018a,b,c;Fard et al., 2018a,b): In a similar structure, simulated flows are compared with associated

Manantali dam
observations noticed by [ ], thereafter, bias are extracted at each date by calculating the differences between simulated and observed flow discharges. Bias is in same scheme used as raw error pattern in the purpose of ensemble producing and is given by: (3)

Box-Cox transformation
The Box-Cox transformation is commonly used to statistically normalize a data set. This operator is applicable either to univariate or multivariate data. Indeed, bias (errors) obtained characterized by the difference between simulations and observations in time series modeling are not often normal. Also, the Box-Cox transformation is applied to error pattern including the mean error. So, to generate more likelihood errors, multivariate Gaussian kernel density function is performed. Normalization is done to allow the use of Gaussian kernel error for augmenting members of the error pattern. Based on this, the Box-Cox inverse is applied to restore the initial structure of the error pattern. In this study, application of the Box-Cox transformation on the error matrix [ ] lead to normalized data referred to as [ ] with random residual. The transformation is given by the following formula (Bickel andDoksum, 1981 Sakia, 1992): In dressing methods such as the best member method (Fortin et al., 1888;Hemri et al., 2015), data in matrix form requires estimated kernel density of the raw biases from the GR4J run needed to generate likelihood error pattern. Indeed, through covariance matrix and mean of the raw error pattern, a kernel density function is defined to enhance members the error pattern (estimate). The covariance matrix of data from the Box-Cox transformation is given by: After applying the Box-Cox processing, statistical ensemble is generated from the Gaussian multivariate probability density function which mean μ and covariance matrix ∑ . Generated error pattern is noticed by and follows theoretically to the same low distribution as the normal data from forward Box-Cox application with statistical characteristics and ∑ . The multivariate probability density function of the smooth ensemble is given as follow (Wilks. 2002): A statistical likelihood error pattern is produced using the inverse of the Φ linear transformation of the raw error pattern . The translation vector of the variable from the linear transformation that permit leading to statistical data with same parameters is the means vector ⃗ . Let and Λ, respectively be the matrix of eigenvectors and a diagonal matrix whose diagonal entries are associated eigenvalues such that Λ = ∑ . The linear transformation satisfying our target is defined by . It is important to highlight that eigenvalue-eigenvector matrixes have same information as the ∑ symmetric matrix (Wilks, 2006). Let centralized error pattern of zero mean and covariance matrix equal to the identity matrix in accordance to the central limit theorem; thus transformed data is obtained such: Reverse transformation of Equation 7 is used to generate studied statistical errors pattern referred to as [ ̂] , in which members will be enhanced according to the following equation (Wilks, 2002;Thomas et al., 2007;Sharma et al., 2002): Larger member of errors is generated to increase the number of members in the ensemble stream flow simulation being set up. The number of the ensemble members is enhanced from to N=50 which N .

Ensemble dressing
It consists of transforming ensemble issues into a continuous distribution function. In dressing methods, independent and larger ensemble is drawn through the raw issued ensemble members (Brocker and Smith, 2008). It is done using an adequate kernel dressing (Gogonel, 2013;Wang et al., 2005). In dressing process, ensemble members for given simulation dates are dressed by a Gaussian kernel in which mean and variance are either or not fitted or scaled by appropriate fitting coefficients (Silverman, 1998;Broecker and Smith, 2008). In this paper, two dressing methods that overcome the shortcoming of the Gaussian density function (Brocker and Smith, 2008) are tested. One of them is the Silverman rule of Thumb (SVMRT). The SVMRT consists of smoothing some discrete ensemble members. For each date, original data is replaced by one Gaussian kernel with non-fitted mean and fitted variance. In this paper, ensemble from this method is noticed by SVMEns. The Gaussian kernel density of an ensemble with parameters denoted ∑ and ̅ can be written as: The other method is the Affine Kernel Dressing (AKD). In this method, five parameters are involved in order to dress a new statistical ensemble using estimate mean and variance for a Gaussian kernel (Bröcker et al., 2008;Sharma et al., 2002). The basis of the AKD dressing is to smooth the raw ensemble from the ensemble modeling scheme. The smoothing involves five free parameters that either scales or shifts the mean or the variance of the raw ensemble in addition to the Silverman factor (Silverman, 1986;Bröcker andSmith, 2008, Fathollahi Fard et al., 2018a,b;Navid et al., 2018) such as: probability scores (CRPS) is called in this paper the AKDEns fit. In this method, fitting parameters involve interposing a logarithmic barrier to ensure positive variance for the Gaussian kernel. The logarithmic barrier is defined as: The AKD processing leads to new continuous distribution function from the member of the raw ensemble noticed by p ( so that. =espectations; = estimated parameters (a, , , , ) of the AKD post-processing and Ens= raw ensemble from the model.

Brier score (BS) and its decomposition
This score was set up by Brier in 1950 and determines the mean squared errors between predictive probability at time period and the binary observations with value 1 in case when observation exceeds a probability threshold and 0, otherwise, at time period (Addor et al., 2011). The Brier score is given by following formula: Mathematically, the lower the Brier score value the best the ensemble modeling system. Thus, optimum Brier score corresponds to zero for perfect modeling. Perfect modeling means that all expected scenarios are observed. The BS can be decomposed into three components: reliability, resolution and uncertainty (Murphy, 1973;Talagrand et al., 1997;Atger, 2004).
Where O represents total frequency of the events.
Reliability measures difference between simulated ensemble and the mean observation. The resolution estimates skillfulness of an ensemble modeling system in considering lower or higher probability events. Simply, the BS characterizes skill of a model in covering more cases. Wide resolution means possibly clustering observations to categories with considerable difference. Uncertainty in the BS decomposition measures the variability of the data. Generally, uncertainty in the BS decomposition does not impact on reliability of the model.

Rank probability score (RPS)
The RPS is based on comparison between cumulative distribution functions of simulations and observations in subsequent percentiles. Thereafter, the observations cumulative distribution function is represented by a cumulative function with probability density taking the value of 1 when the probability threshold is exceeded and Ndione et al. 473 0 otherwise (Epstein, 1969;Murphy, 1971). The RPS is obtained by summing the Brier score at different percentiles (10%, ... 30%, ... 90%). This approach is useful for the purpose of categorizing river flows. For example: low flow (<10%), medium flows (10% < Q < 90%) and high flows (> 90%) (Zalachori, 2013). Thus, different probability states can be obtained for an ensemble according to chosen percentiles. Let N be the spread of an ensemble forecast in which Brier score is calculated for M categories of the ensemble determined for a percentile. The rank probability score at time is given by: denotes the forecast probability of category at time , represents binary variable which value 1 when the category is observed and 0 if associated category is not observed at time . Thus, the RPS over the simulation period is obtained through following formula: (17)

Continuous rank probability score (CRPS) and its decomposition
The continuous rank probability score is an extension to infinity of the discrete rank probability score. The CRPS determines global quality of an ensemble issue. In CRPS assessment, the numbers of category are stretched to infinity. Further, when the M numbers of category is well stretched in equation of the RPS, the CRPS is obtained (Matheson and Winkler, 1976;Hersbach, 2000;Zalachori, 2013). This score measures the distance between the ensemble and the observations. It can be defined as integral over infinite limit interval of the Brier score. Variables involved in estimating the CRPS are: distribution function of the simulated values and other one distribution function with values of 0 when the simulated values are lower than observed one and 1 otherwise (Heaviside function). Equation of the continuous Rank Probability Score (CRPS) is formulated as follows (Marty, 2013;Casati et al., 2008;Gogonel et al., 2014): (18) is d expectation distribution function at time i. is the heaviside function and N is number of time step.

Attribute diagrams
Attribute diagram is the plot of predictive probabilities against frequency of occurrences. It allows assessing if predictive probability agrees with observed frequency. Attribute diagram gives reliability, resolution and the uncertainty of an ensemble system with respect to the observation and is concretely based on analysis of underlying plot relatively to some graphical characteristics (1:1 line of perfect simulation, area of skill and resolution). Perfect simulation leads to superposition of the curve with the first bisector   1). Uncertainties are defined by the distances between point of classes for given threshold and the first bisector (Atger, 2004). The resolution estimates how much the forecasts differ from the climatological mean probabilities of the event, and how the systems get it right. Thus, attribute diagram is used in this study to assess how implemented ensemble simulation scheme performs.

ROC curve
The ROC (Relative Operating Characteristic curve) curve allows evaluation of separation of means of conditional distribution for simulated data and is a popular tool for decision making. ROC curve is obtained by plotting the false alarm rate against hit rate. The ROC is positive when the curve is beyond the 1:1 line with area under the ROC (AUC) of 0.5. Perfect modeling corresponds to AUC of 1. The AUC defines the probability of successfully discriminating an event from a nonevent (Mason, 1982;Atger, 2001). Here, expectation relevant to the modeling scheme is referred to as ʄ and is defined as a probability exceeding a threshold value noticed by : p (x ≥ ). Relatively, defines the non-expected event characterizing probability of x < . Let characterized on one hand be expected and observed events (Hit) by ∑ and by ∑ , expected and unobserved events (False alarm). On the other hand, it is represented by ∑ as the non-expected and observed events (Miss) and by ∑ as the non-expected and unobserved events (Correct rejection). The false alarm and hit rates can be defined as follow: Hit and false alarm rates are calculated using the following contingency Table 1.

Processed errors for ensemble making
Errors from past simulations of GR4J model already calibrated and runs upper the Manantali dams are settled in matrix form with rows corresponding to variables and columns to observations. Estimate parameter (λ = 0.24) of the Box-Cox is found using the least squared method at significant level of 0.5. The p-value and the statistic of the estimation are respectively 0.143 and 0.902. Normalization is done aimed at extending the number of members through multivariate Gaussian kernel. Result gives statistical error pattern with a random residual. Members of the ensemble errors are extended from 15 to 50. Reverse Box-Cox transformation is further applied to restore initial properties (statistic characteristics) of the errors pattern. The set of 50 member's errors is used to perturb flows in calibrating the GR4J model further runs.

Plots of the Raw-Ens and AKD-Ens
The raw ensemble with the post-processed one are assessed here. In this study ensemble less than the raw ensemble (RAW-Ens) and the ensemble from dressing method (AKD-Ens) are plotted and brief exploratory analysis of ensemble is made concerning the representativeness of the two ensembles. It is important to highlight the fact that width of ensemble makes representation difficult using common computers. Then, in this study it is done reducing time span of the overall time series. Indeed, direct manipulation of ensemble simulation allows discovering related shortcoming. Thus, ensembles are represented between 01/01/1965 and 31/12/1968 (Figures 3 and 4). Exploratory analysis of both RAW-Ens and AKD-Ens ensembles hydrograms show ensemble can be assumed to likely represent observations and it is important to notice that hydrogram of the observation goes up and down ensemble. Thus, good representation can be assumed.

Deterministic characteristics and scores
Both the deterministic and the probabilistic evaluation criteria were studied. The dressed ensemble gives better results even if the raw one remains good ensemble. The mean error (Me) has been significantly improved: 0.519 for RAWEns to 0.026 for the AKD-Ens. The mean absolute error (EnsMae) has been improved by 27.5% and the ensemble root mean squared error (EnsRmse) by 32.64%. These ensembles characteristics with additional probability scores are presented in Tables 1 and 2. Results show that the affine kernel dressing can be used to improve quality of simulations from an ensemble modeling scheme. Thus, for more reliable F ar = &no 0&no + &no H r = &o &o + 0&o  Tables 2 and 3. It is important to keep in mind that for probabilistic evaluation, the closer to zero the scores the better the system is. If scores are closer to one, model is not reliable and the use of such system is discouraged. Between the two ensembles, quality of simulation varies according to the evaluating method. Consequently, dressing method remains improving method for probabilistic simulation.

Attribute diagrams of the ensemble schemes
Attribute diagram show simultaneously reliability, resolution and uncertainty of one performed ensemble. In the attribute diagram, frequency of issued probabilities is plotted against binned observations. For perfect modeling, the plot of frequencies against binary observation fits the first bisector. Interpretation of the reliability diagram is based on exploratory analysis of the behavior of observed frequency against probability plot in comparison to the first bisector. In addition, it is also based on analysis of the spread of the resolution band and position of the curve relatively to the lines delineating the skill and the resolution (Hamill, 2006;Wilks, 2006) of the model. Attribute diagrams of the two models are presented in Figure 5. Good reliability is assigned to both models (Figure 4a and b). The affine kernel dressing method diminish the reliability (because distance between Probability of simulation vs Observed Relative Frequency has been increased) and resolution of the raw ensemble according to the Brier score, while overall reliability through the continuous rank probability score and discrimination have been improved. In comparison to Figure 2, both ensemble modelling systems are reliable and can be used for water planning at the Manantali dam.

Roc plots
Discrimination of an ensemble from a system can be assessed by analysis of ROC curves. Performing the ROC curve is based on determining the success rates (Hits) and failure rates cases for an implemented ensemble modeling system. For a set of probability exceedances ranging from 0.1 to 1, the sum of number of simulated and observed events and sum of number of simulated and non-observed events are calculated. For each probability threshold, strictly lower events are considered as non-expected, then are classified into two categories: non-expected and non-observed on one hand and non-expected and observed on the other. The results for the different systems are shown in Figure 6. For two systems (RAW-Ens and AKD-Ens), Hits prevail on False alarm rates except for the AKDEns. Their AUC are over the line of skill. Also, the systems have good skill in reproducing flow discharges upper the Manantali dam. Ensemble systems considered as skillful in probability simulation of flows is based on the fact that Hits prevails on False rates for most of the probability thresholds. Balance sheet analysis can be used to visually assess skill of a model with regard to bins which Hit rates are beyond or down the 1:1 line of no skill. Hence, both systems lead to more success than failures and can be used to produce probabilistic simulation of flow discharges at the Senegal River. Elsewhere, through ROC curves, decision making can be improved particularly when   faced with extreme events of which threshold value is fixed (drought or flood). Once more, the dressing method improves issues from the scheme providing the RAW-Ens. The AUC has been improved by 2.3% and is  greater than 0.5 that corresponds to the 1:1 line of the ROC diagram. Besides, for failing system, the ROC curve goes down the 1:1 line of no skill.

Conclusion
Simulating or forecasting ensemble hydro climatic variables is an ingenious way to quantify uncertainties related to forecasting and simulation processes. In order to overcome difficulties of bias inherent to hydro climatic modeling, scientists have been oriented towards probability theories in order to consider modeling uncertainties through issuing probabilistic values. Through ensembles processing, expectations are associated to a probability of occurrence, and then naturally of nonoccurrence. This paper thus aims at producing probabilistic simulation of flow discharge at Senegal River upper the Manantali dam. The main ensembles called the raw ensemble is set up by perturbing the input flow of the GR4J model. Furthermore, three other ensembles are also set up through post-processing tools used in ensemble forecast systems processing. Inputs are perturbed using past errors pattern of the GR4J runs upon 14 years. The error pattern in this study is designed on the basis of the periodic behavior of Senegal River flow over 365 days. Box-Cox transformation is used to normalize the raw error pattern in order to generate wide members through a multivariate Gaussian kernel. Reverse Box-Cox transformation is also applied to restore initial statistical characteristics of the errors pattern. Drawn perturbations are applied to recorded flow discharges used in calibrating the GR4J model. From this process, ensemble streamflow is performed in which members are 51 including the ensemble mean. A postprocessing referred to as Affine kernel dressing has been applied to the raw ensemble (RAWEns) from turned GR4J with perturbed input (AKD-Ens). Ensemble forecast verification tools are used to evaluate the reliability, resolution and skill of the tested ensemble modeling systems schemes. The results are satisfactory for two systems. Globally, the dressing method improves in this case of study the raw ensemble. Both ensembles are reliable and can be used to help in decision making at the Manantali dam though predominance of Hit on false alarm, rates.