Classical and Bayesian Markov Chain Monte Carlo (MCMC) modeling of extreme rainfall (1979-2014) in Makurdi, Nigeria

This study presents a probabilistic model for daily extreme rainfall. The Annual Maximum Series (AMS) data of daily rainfall in Makurdi was fitted to Generalized Extreme Value (GEV) distribution using Maximum Likelihood Estimation (MLE) and Bayesian Markov Chain Monte Carlo (Bayesian MCMC) simulations. MLE is a reliable principle to derive an efficient estimator for a model as sample size approaches infinity. Results in this study show that despite the asymptotic requirement of the MLE, its performance can be improved when adopting Bayesian MCMC. The comparison between the performance of MLE and Bayesian MCMC methods using Percent Bias (PBIAS), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) proved Bayesian MCMC is the better method to estimate the distribution parameters of extreme daily rainfall amount in Makurdi. Based on the 36-year record of rainfall (1979-2014) in Makurdi, return levels for the next 10, 100, 500, 1000 and 10000 years were derived.


INTRODUCTION
Recent researches and observations of the climate system have shown that the climate system is more complex than concluded by the working group 1 of the Intergovermental Panel on Climate Change report (IPCC, 2007).According to Pielke (2011) these climate systems models do not appear to be capable of providing skillful predictions of regional, local societally and environmentally important impacts in the coming decades.
Earlier studies such as Houghton et al. (1990) reported that the Earth's climate is warming and will continue to warm in the future, as a result of changes in atmospheric carbon dioxide (CO 2 ) and other trace gases.This global warming will lead to changes in annual or seasonal precipitation.Valipour (2015) studied runoff forecasting in the United States using seasonal autoregressive integrated moving average (SARIMA) and autoregressive integrated moving average (ARIMA) for data from 1901 to 2010 (mean of all stations in each state).In this study, it was reported that the accuracy of the SARIMA model is better than that of the ARIMA model.The occurrence of many extreme events in hydrology cannot be forecasted on the basis of deterministic information with sufficient skill and lead time.In such cases, a probabilistic approach is required to incorporate the effects of such phenomena into decisions.If the occurrences can be assumed to be independent in time, that is, the timing and magnitude of an event bears no relation to preceding events, extreme hydrologic frequency analysis (HFA) can be used to describe the likelihood of any one or a combination of events over the time horizon of a decision (El Adlouni and Ouarda, 2010).Valipour (2012) and Valipour et al. (2013) studying inflow into Dez dam reservoir in Iran used 47 years (first 42 years was used to train the model and past 5 years was used to forecast).In these studies they reported that the Auto Regressive Integrated Moving Average (ARIMA) had less error compared with the Auto Regressive Moving Average (ARMA).The superiority of ARIMA over ARMA was due to the effect output delay as an input to the network and increasing network training power.Olofintoye et al. (2009) in studying the best-fit probability distribution model for peak daily rainfall of selected Cities in Nigeria including Makurdi, used different statistical analyses such as Gumbel, Log-Gumbel, Normal, Log-Normal, Pearson and Log-Pearson distributions and the parameters of the distributions were estimated by the methods of moments (MOM) and probability weighted moments (PWM).Isikwue et al. (2012) established an empirical model that correlates rainfall intensity-duration-frequency for Makurdi over a period of 30 years.The study also showed a linear relationship between the rainfall amounts and their corresponding duration.Valipour (2012) worked in Iran on number of observation data for rainfall forecasting reported that in temperate climate, due to low changes of monthly rainfall in long years, 60 observation data was enough to forecasting.Ramírez et al. (2005) used artificial neural network (ANN) technique for rainfall forecasting applied to the São Paulo region.The results showed that ANN forecasts were superior to the ones obtained by the linear regression model thus revealing a great potential for an operational suite.Han et al. (2010) forecasted drought based on the remote sensing data using Auto Regressive Integrated Moving Average (ARIMA) model successfully.Chattopadhyay and Chattopadhyay (2010) compared ARIMA and Auto-Regressive Neural Network (ARNN) models using Univariate modelling of summer monsoon rainfall time series.Coles (2001) considered Maximum Likelihood Estimate (MLE) as the best method because of its all-round utility and adaptability to model-change.This means that, the underlying methodology is essentially unchanged even though the estimating equation is modified.One of the drawback of this method is its asymptotic requirements and a common problem in extreme value analysis is that of data scarcity.An alternative is the Bayesian approach which can be used in estimating the parameters of GEV.This approach is increasingly popular in many areas of application, a challenge when adopting this approach is the computational difficulties.This may be solved by the application of Markov chain Monte Carlo (MCMC) simulations.As noted by Eli et al. (2012) extreme data are limited in nature however, Bayesian inferences have the ability to incorporate other source of information via prior distribution.Bayesian analysis is not dependent on regularity assumptions required by asymptotic theory of MLE.Studies such as Coles and Tawn (1996), Bates and Campbell (2001), Coles et al. (2003), Smith (2005), Fawcett and Walshaw (2008) and Zin et al. (2012) have also shown the applicability of Bayesian inference using MCMC in extreme rainfall analysis.
Recent studies such as Abah (2013) show that Makurdi town is potentially susceptible to flooding in the event where there is a prolonged or intense downpour which could cause flash floods and raise the volume of the River Benue to tremendous levels making it over spill its banks.Frequent occurrences of droughts and floods in the past have severely affected the economy of Benue State, which depends primarily on agriculture.It is necessary to have a quantitative measure with high accuracy or likelihood on extreme rainfall events which can be utilized in the design and construction of hydraulic structures, planning of soil and water conservation practices and other agricultural activities.The main objective of extreme rainfall modeling is to estimate the values of rainfall amounts that might occur once in 10, 50 or 100 years, based on 10-year or 30-year history of rainfall.An accurate prediction of these events can significantly aid in policy making and also in designing an effective risk management system.Hence, there is a need to know the magnitudes of extreme rainfall events and their recurrence intervals over Makurdi as it continues to experience seasonal flood events induced by high rainfall magnitudes.

Study area
Makurdi the administrative headquarter of Benue State, Nigeria lies approximately between latitude 7°44 1 N and longitude 8° 54 1 E. The town is located along the coast of the Benue River (Shabu and Tyonum, 2013).The land use in and around Makurdi is agricultural and the farmers mainly produce rice since it is a low land area.The area is in the tropics and has two seasons dry (May -October) and wet (November -April).The average annual temperature is 31.5°Cwhile the relative humidity ranges between 65 to 69% with annual rainfall varying between 1000 to 2500 mm (Isikwue et al., 2011).

Data collection
Thirty-six years  daily rainfall data of Makurdi was collected from the Nigeria Meteorological Agency (NIMET), which has its base station (rain gauges) at Nigeria Air force Base (NAF) Makurdi, Nigeria.

Analysis of extreme daily rainfall in Makurdi
The Generalized Extreme Value distribution (GEV) which has three parametric subsumes (Gumbel; Type I, Frechet; Type II, and Wiebul; Type III) was fitted to the Annual Maximum Series (AMS) data utilizing Maximum Likelihood and Bayes methods of parameter estimation with the intention of determining the magnitudes and return periods of extreme daily rainfall in Makurdi.
The cumulative distribution function (cdf) of GEV is given as: (1) Where: ; = the probability of an event not exceeding x, x = input variable representing the daily maximum rainfall

Maximum likelihood estimation (MLE)
The maximum-likelihood approach is an example of classical statistical modeling in which the parameters are assumed to have fixed values that we are trying to estimate as precisely as possible.
MLE requires us to maximize the likelihood function with respect to the unknown parameters.The Generalized Extreme Value model ( ), is given by Coles (2004) as: (2) Isikwue et al. 125 Provided that

Bayesian method of estimation
In the Bayesian modeling technique parameters are treated as random variables, with probabilities assigned to particular values of a parameter to reflect the degree of evidence for that value.This include a posterior density of interest, that is proportional to a prior distribution, and the likelihood function, that gives the most likely parameters that generated the obtained data at hand and is given by Equation ( 3) with replaced by . ( This approach is based on Smith (2005) was used on Bayesian modeling of extreme rainfall data for South-West England and also by Fawcett and Walshaw (2008) on modeling environmental extremes.The results in this section were obtained following the guidelines of Stephenson and Ribatet (2015) using the evdbayes package in R statistical programming environment (R Studio).
In the Bayesian approach, the MLE estimates of the GEV parameters and were utilized as the initial values for the MCMC simulations which were then used to produce trace plots and marginal posterior densities.From the simulated samples in the MCMC run we obtained the posterior estimates of the parameter vector ( ) which is a stationary distribution that is slightly dependent and approximate from our target distribution, .

Comparison of the performances between MLE and Bayesian methods
The After the best models for the data have been determined, the next step was to derive the return levels for rainfall.Formally, the cumulative probability of non-exceedance is given by: (4) Where T = return level.
Solving for using the definition of the GEV distribution yields: Table 1.Estimates of parameters using maximum likelihood estimator.

Classical and Bayesian models for extreme daily rainfall
The time series plot of annual daily maximum rainfall for Makurdi in Figure 1 shows that the highest magnitude of 149.3 mm occurred in the year 2000.While this high magnitude appears as an outlier, there is historical evidence that it did occur (according to Ologunorisa and Tor, 2006), in a year characterized by rampant flooding in rural and urban regions of Makurdi, which led to loss of lives, disruption of economic activities, destruction of agricultural lands and other properties.The blocks = 365 days have been chosen to be reasonably large, so we fitted our Models to the = 36 annual maxima using maximum likelihood estimation.We obtained fitted parameter values for Model 1 : Weibull and Model 2 : Gumbel, standard errors in parentheses and the maximized log-likelihood values for the fitted models are given in Table 1.The Markov Chain Monte Carlo (MCMC) techniques were then applied to give the Bayesian parameter quantiles of the annual maxima rainfall data using noninformative priors.Table 2 present results for the Bayesian posterior parameter estimates of the Generalized Extreme Value (GEV) distribution (Model 1 ), the associated naive standard errors and the 95% credible intervals (CI) of the parameters μ, and .

Analysis of tail behaviour
Visualization of the goodness of fit of the Generalize Extreme Value distribution to the observed data is presented in Figure 2 (GEV: Model).This include probability, quantile, return level and density plots.

Trace plots
Convergence of the simulated distribution to the target or stationary distribution is visualized by the trace plots in Figure 3.The trace plots looks like a horizontal band, with no long upward or downward trends, the Markov chain seems to be mixing well enough and is likely to be sampling from the stationary distribution, this is evidence that the chain has converged.The Figure 3 also shows density plots for each parameter.Table 3 presents results of the Percent Bias (PBIAS), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) which are values computed by comparing the observed values of maximum daily rainfall in Makurdi to the simulated values from the formulated models using MLE and Bayesian MCMC approaches.

Extreme rainfall magnitudes for various return periods
The various return periods of the annual maximum rainfall are presented in Table 4.The inversion of Model 1 (Equation 5) yielded maximum likelihood estimates of return levels (mm) with their respective 95% confidence intervals (Equation 6) and the posterior medians defined on a 95% credible interval using non-informative prior.
The return level and return period are some of the most important quantities derived from the fitted models.The plot of is called a return level plot (Figure 4) which visualizes the return level estimates (solid line), the 95% upper and lower credible intervals (dashed lines) in the Bayesian framework.

The classical and Bayes models
The ismev package in -studio yielded maximum likelihood estimates ( ) of (74.5, 18.1, -0.1); with standard errors (3.4,2.4 and 0.1) for , and  respectively.Resulting approximate 95% confidence intervals for each parameter are 67.9 and 81.1 for , 13.5, 22.7 for , and -0.3, 0.2 for .This shows that although the estimate of the shape parameter is negative, the 95% confidence interval extends above zero, highlighting the uncertainty of estimation.The negative shape parameter and the diagnostic plots such as the probability and quantile plots (Figure 2) show each set of plotted points to be near linear, validating the use of the GEV.Model 2 shows only that a zero shape factor, is also consistent with the data though the standard errors are smaller and the deviance statistics which is twice the difference between the absolute values of the negative log-likelihood of both models is also small.Anderson and Smith (2006) argues that even if formal tests support model reduction from GEV to Gumbel, unless there are external grounds distinguishing the value of zero for , the GEV model should be preferred, as its conservatism gives increased protection.The falls within the range of which coincides with the most likely range in hydrological practice indicated by Martins and Stedinger (2000).Furthermore, as a cautionary principle since there are no other grounds for assuming , it seems safest not to.
The Bayesian MCMC estimates suggest that there is a finite upper endpoint to the distribution since the shape parameter is negative, again validating the use of Model 1 (GEV) and this supports Koutsoyiannis (2004) comment that the wide spread use of the Gumbel distribution is related to the fact that sample sizes are often small.

Performance of the MLE and Bayes MCMC parameter estimation methods
There are small differences in the parameter estimates  for both methods this is expected since a near-flat prior has been used.The MCMC method seems to reduce the RMSE and MAE which is a useful property since it reduces the potential locations of the estimate or uncertainty of the estimate.The PBIAS (Table 3) is slightly higher for Bayesian MCMC probably as a result of random error in simulation, they are both close to zero indicating model appropriateness.
The expected benefit of the Bayesian analysis is the improvement in precision of the parameter estimates over the MLEs.The incorporation of prior information in Bayesian MCMC parameter estimation in line with Naakaa (2015) proved to be a marginally better method compared to the MLE; the existence of data with relatively larger values (rare events) compared to other data such as the August 3 rd , 2000 extreme rainfall (149.3 mm) in Makurdi may have influenced the analysis results as the analysis is based on the mean value that is rather sensitive towards any outlier in data, this demonstrates the robustness of the Bayesian approach against outliers.One of the most important derivations of this study is that future modeling of extreme rainfall in Makurdi can utilize this results as expert opinion in prior elicitation or in specifying informative prior distribution and as more information or data is collected the model can be updated.

Prediction of rainfall magnitudes for various return periods
The sequence of simulated values ( , and ) were transformed leading to a sample from the corresponding distribution of return levels, .This was highlighted by De Paola et al. ( 2013), suppose that a flood defense scheme is to be built to protect against all levels of rainfall it is likely to experience over its engineered 100 year life span.We can estimate what rain levels might occur in the coming century, given the 36 years local data available for this study the maximum likelihood estimate for the 100-year return level, = 147.2mm, with a profile likelihood of 95% confidence interval [127,219] mm.This show that the 149.3 mm annual daily maximum rainfall event in Makurdi of 3 rd August, 2000 has a return period that is above 100 years which implies that it was indeed an extreme event as defined by Coles (2001).
Alternatively Bayesian analysis considers the problem, not as one about estimating the ultimate limit parameter, but estimates a prediction interval (quantification of uncertainty), which will give more precise information about what is likely to happen in a given year, considering the previously recorded observations (Smith, 2003).From Table 4, it can be seen that the Bayesian approach gives higher quantile estimates in the upper tails than the frequentist approach, for instance the 100year return level gives an estimate = 157.1 mm and credible interval [133.8, 264.7] mm.This means that Bayesian approach gives a shorter return period for extreme events compared to the MLE method.The return levels of the posterior distribution in Figure 4 shows the upper 95% interval to be further from the median than the lower.This is because of the heavier upper tail of the posterior distribution due to the non-negative prior on .
These quantities are most useful to practitioners in extreme event modeling and engineering design.As noted by Dandy (2013) culvert design, for example, involves analysis of headwater depths, along with estimations of future extreme precipitation amounts.Severe importance lies within proper estimations of stream flow and headwater depths, demonstrating high importance of a thorough methodology towards statistical analysis of climate data.This also agrees with the comment by Wendy and Noriszura (2012) that the predictions of extreme rainfall events for several return periods provide important and valuable information for the management and the planning of water resources, especially for proper drainage systems, reservoirs and surface waters and for utilizations in agricultural sectors and socio-economic activities.The information can be used to facilitate the government and other related parties in prioritizing water resources in their efforts to reduce or control the risks of large losses.

Conclusion
Statistical model of the annual maximum of daily rainfall of Makurdi has been developed through classical and Bayesian approaches.The extreme daily rainfall was shown to follow the Generalized Extreme Value distribution which has a negative shape parameter, often appealing as it has a finite upper limit.Despite the use of non-informative prior information in the Bayesian MCMC parameter estimation method it proved to be a better method compared to the MLE, this demonstrates the robustness of the Bayesian approach against outliers (rare events) and in handling data scarcity.The model was used to compute the extreme rainfall return levels in the 95% confidence and credible intervals for the return periods of 10, 100, 200, 500, 1000 and 10,000 years.Therefore, extreme rainfall and the subsequent flooding events in Makurdi can be appropriately handled by incorporating these results in the design and assessment of dam safety level and other infrastructure, such as culverts, reservoirs, flood mitigation levees and retarding basins which can prevent flooding, pollution in watersheds, and overtopping on roadways, it is therefore necessary to make inferences on their return levels crucial for designers and engineers, to the extent that they should be built into legally binding codes of practice.
; = random rainfall variable, = the exponential or naperian logarithm, = the location or relief parameter, = scale parameter, is a function of the drainage and rain gauges network, and = shape parameter which controls the tail behavior of the probability distribution.It has a value of for Type I, for Type II and for Type III.
Performances of both methods of parameter estimation were analyzed based on Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Percent Bias (PBIAS) obtained from Monte Carlo simulation.PBIAS measures the average tendency of the simulated values to be larger or smaller than their observed ones.The optimal value of PBIAS is 0.0, with low-magnitude values indicating accurate model simulation.Positive values indicate overestimation bias, whereas negative values indicate model underestimation bias.

Figure 3 .
Figure 3. Trace and posterior density plots of the parameters of Model 1 .

Figure 4 .
Figure 4.Return level plot of posterior distribution with 95% Bayesian credible intervals (dashed lines) in Makurdi.
Figure 2. Diagnostic plots of the MLE approach for Model 1 .

Table 3 .
Comparison of performance between maximum likelihood and bayesian methods.

Table 4 .
Return level estimates using MLE and Bayesian MCMC for Model 1 .