A frequentist and Bayesian regression analysis to d aily peak electricity load forecasting in South Africa

A frequentist and Bayesian regression analysis to a piecewise linear regression model for daily peak electricity load forecasting in South Africa for th e period 2000 to 2009 is discussed in this paper. T he developed model captures a wide variety of electric ity demand drivers such as temperature, seasonal, lagged demand and calendar effects. A Bayesian anal ysis provides a way of taking into account uncertainty in the estimation of the piecewise line ar regression parameters. Uncertainty about the tru e values of the Bayesian parameter estimates is incor porated into the analysis through the use of a noninformative prior distribution. The results obtaine d are easy to explain to management. Empirical results showed that an increase in electricity peak demand, if temperature decreases by 1°C, could be any value between 140 and 200 MW during the winter months. Similarly during the summer months the increase in electricity peak demand, if temperature increases by 1°C, ranges from -20 to 80 MW. There is a persistent increase of around 2 MW in hourly elec tricity peak demand with time in South Africa. Electricity demand in South Africa is more sensitiv e to the winter period. Demand for electricity duri ng holidays decreases significantly compared to a day before and after a holiday. This information and th e quantification of such uncertainty are important fo r load forecasters in the power Utility Company in South Africa (Eskom) as it helps them in the determ ination of consistent and reliable supply schedules .


INTRODUCTION
Short term electricity load forecasting is very important for system operators who have to ensure that the amount of electricity drawn from the grid and the amount generated balances (Cottet and Smith, 2003;Taylor, 2006).In the absence of blackouts and load-shedding the electricity load is equal to electricity demand.Load forecasting is generally divided into short-term, medium-term and longterm forecasting.Most papers in literature concentrate on short-term point forecasts (Munoz et al., 2010).Shortterm load forecasting is important to ensure that there is a balance between demand and supply since electricity cannot be stored (Munoz et al., 2010).*Corresponding author.E-mail: csigauke@gmail.com.
Demand drivers of electricity are generally split into, economic factors, weather variables and calendar effects.In short-term forecasting weather variables such as temperature and calendar effects are usually incorporated in models for electricity demand.The influence of temperature on daily electricity load forecasting has been studied extensively in the energy sector using classical (frequentist) statistics time series, regression based methods including artificial neural networks (Fan and Hyndman, 2011;Hekkenberg et al., 2009;Mirasgedis et al., 2006;Saini, 2008).
Most papers in literature concentrate on point forecasting only.One major drawback of point forecasting only is that it does not take into account uncertainty in the estimation of the parameters.One way of overcoming this, is the use of Bayesian analysis and density forecasting.Load forecasting using Bayesian statistics is discussed in literature.Chaturvedi (1996) discusses the use of robust Bayesian analysis.The author concludes that the use of robust Bayesian analysis overcomes the major drawback of coming up with a correct elicitation of a prior distribution.Interplay of Bayesian and frequentist analysis is discussed in Bayarri and Berger (2004).Bayarri and Berger (2004) identify the most common areas of important and useful connections between the Bayesians and frequentists as those scenarios when no external information, except the data and the model itself, is to be introduced into the analysis.Bayarri and Berger (2004) indicated that there are many areas of frequentist methodology that should be replaced by existing Bayesian methodology that provides superior answers.
In this paper we develop a piecewise linear regression model.The focus in this paper in on the use of the Bayesian parameter estimates to a piecewise linear regression model in explaining the influence of temperature on daily peak electricity demand in South Africa.The paper concentrates on daily peak demand modelling which is important for providing short term forecasts which will assist in optimal dispatching of electrical energy.Our modelling approach is designed in such a way that we initially use a multivariate adaptive regression splines (MARS) algorithm to determine temperatures which separate the winter period from the weather neutral period and the temperature that separates the summer period from the weather neutral period.Estimation of the parameters of the model then follows using the least squares method.The next step involves the use of Bayesian statistics in developing posterior densities for the parameter estimates to the piecewise linear regression model.A Bayesian analysis provides a way of taking into account uncertainty in the estimation of the parameters.Uncertainty about the true values of the Bayesian parameter estimates is incorporated into the analysis through the use of a non-informative prior distribution.Arguably the Bayesian approach is more informative than the classical approach.This is important for load forecasters and system operators in the electricity sector as it helps in the determination of consistent and reliable supply schedules.
The rest of the paper is organized as follows; description of the load and temperature data; discussion based on models used in this paper; presentation of empirical results; detailed discussion of the results; conclusion.

LOAD AND TEMPERATURE DATA
Hourly data from Eskom, South Africa's power utility company is used in this study.Daily peak demand (DPD) data for the period 2000 to 2009 is used.We define DPD as the maximum hourly demand in a 24 h period which is a day. Figure 1 shows the graphical plot of daily peak demand and exhibits strong seasonality with a steep positive linear trend.
Figure 2 shows a graphical plot of daily demand profiles for the seven days of the week with DPD around20:00 h.Historical data on temperature were also collected from 22 meteorological stations from all the provinces of the country.The data were aggregated to obtain peak temperature corresponding to hour of daily peak demand, average daily maximum and minimum temperatures for the entire country.

Thousands
Figure 3 shows a non-linear relationship between load and temperature.A visual inspection of Figure 3 shows that electricity demand is more sensitive to the winter period.The nonlinear relationship between load and temperature is modelled in literature using cooling degree-days and heating degree-days.Cooling degreedays ( CDD ) and the heating degree-days ( HDD ) are estimated on the basis of the following two linear functions defined as: and where represents the temperature which separates the winter and summer periods of average daily electricity demand and temperature relationship and represents average daily temperature (mean outdoor air temperature) on day .If there are no HDD and similarly when there are no CDD.

THE MODELS
Our modelling approach is in three steps.Initially a MARS algorithm is used in the determination of temperatures that separate winter and summer from the weather neutral period and also in selecting the important knot(s) , which represents the temperature which separates the winter and/or summer periods of daily peak electricity demand.The second step involves the development of a piecewise linear regression model and estimation of the parameters using the least squares method.In the third step Bayesian statistics is used in developing posterior densities for the parameter estimates to the piecewise linear regression model.

The multivariate adaptive regression splines (MARS) frequentist model
The general MARS model can be written as in Friedman (1991) and is given in Equation 1. ( where is a basis function written as (2) M is the number of basis The MARS method does not make any assumptions about the functional relationship between the response variable and the predictor variables.The MARS modelling approach overcomes the major drawbacks of using artificial neural networks which have long training processes, interpretive difficulties and the inability to determine the relative importance of potential input variables.MARS has been used to solve high dimensional problems with complex model structures such as nonlinearities, interactions, multicollinearity and missing values (Chen, 1997;Friedman, 1991;De Gooijer and Ray, 2003;Tsekouras et al., 2007).

Piecewise linear frequentist regression model
The piecewise linear function used in this paper is shown in Equation 4 will account for day to day variations in electricity peak demand.The week to week, monthly and semi-annual variations in electricity peak demand will be accounted for by respectively.

Bayesian linear regression model
The advantage of the Bayesian approach lies in the fact that one can create the posterior distributions and hence can do inferences from the posterior distribution.From distributions of the parameters we are in a position to obtain quantiles, credible regions and perform other inferential tests.The methods can be generalized to more complicated situations.The normal multiple regression model is given by: (5) Where is an is a 1 × k vector of regression coefficients and is an 1 × n vector of error terms.The error terms, are assumed to be normally and independently distributed each with mean zero and common variance .The parameter vector is We then estimate and together with their respective probability density functions ( s) and also test hypotheses involving the same parameter vector.In the frequentist approach the coefficient vector is estimated by the ordinary least squares method using the Moore-Penrose pseudo inverse, .In the Bayesian approach the sample data is supplemented with additional information which is in the form of a prior distribution.Assuming that the random errors are independent and normally distributed we obtain the joint likelihood for given , and as where and In specifying our prior distribution we assume that our information is vague.We adopt the notation used in (Zellner, 1971).We use a non-informative Jeffreys' prior: (7) Combining the prior distribution in (7) with the likelihood function in (6) we get the following joint posterior distribution ( 8) Integrating ( 8) with respect to we get the following marginal posterior for the elements of : (9) The marginal posterior in (11) follows an inverse gamma .

EMPIRICAL RESULTS AND DISCUSSION
Electricity demand is generally subject to seasonal changes and an upward positive linear trend.The graphical plot of DPD in Figure 1 exhibits strong seasonality with a steep positive linear trend.A visual inspection of the graphical plot of detrended DPD in Figure 4 shows that there has been a gradual decrease in annual peaks since 2007.This is probably due to energy efficiency and demand side management strategies put in place by Eskom, South Africa's power utility.

Piecewise linear model
The empirical results are presented in this area.Our piecewise linear regression model is given in Equation 12(12) where all the variables are as defined in Equation 4.

Substituting the coefficients of the variables we get
The dummy variable is negative showing that if peak temperature decreases by one degree from 18°C, electricity demand will increase by 171.468MW.The coefficient of is positive showing that if temperature increases by one degree from 22°C, electricity dema nd will increase by 24.85 MW.We conclude that electricity demand in South Africa is more sensitive to the winter period.All the coefficients of the dummy variables representing Friday, Saturday, Sunday, holiday, day before holiday and day after holiday are negative indicating that there is a decrease in demand during  these periods.The largest decrease is on Sunday out of the three days of the week.Demand for electricity during holidays decreases significantly compared to a day before and after a holiday.The root mean square error (RMSE) and mean absolute percentage error (MAPE) are generally used for evaluating the predictive power of short-term load forecasting (STLF) models (Munoz et al., 2010).These accuracy measures are used in the in-sample forecasting evaluation of the model.Several models were run and the results of the best model are summarized in Table 1.For the period , the RMSE, MAPE are given as: , , where and are the actual and predicted daily peak loads at time

Posterior distributions of the parameters
The posterior probability density functions ( s) for the piecewise linear regression parameters are given in this area.We discuss the results of the parameters , and and the rest of the posterior distributions of the parameters ,…, are summarized in Table 2 and given in Figure 6.
The three demand-temperature lines for weekdays without the trend and holiday effects (that is, would be shown. For the non-weather sensitive months (18°≤ x pt ≤ 22°, x 1t = x2t 0) we get MW is the posterior mean daily peak electricity demand for the nonweather sensitive period.The posterior for shown in Figure 5(a) shows the non-weather sensitive mean daily peak demand could take any value between 23500MW and 28 000 MW.
There is a persistent increase in hourly electricity peak demand with time in South Africa.The slope is around 2 MW per year.It is highly unlikely that the slope exceeds 3 and highly unlikely that it is below 1.The calculated mean  For the winter sensitive months we get (13) That is if temperature decreases by 1°C (from from 18 to 17°C) the marginal increase in daily peak demand wi ll be 171.468MW.Because of the uncertainties' associated with demand, this increase could be any value between 140 and 200 MW.In other words, it is highly unlikely that this increase will be less than 140 MW and highly unlikely that it will exceed 200 MW.That is if temperature increases by 1°C (for examp le, from 22 to 23°C) the marginal increase in demand wi ll be 24.85MW.This marginal increase could be any value between -20 and 80 MW.The posterior for parameter is shown in Figure 5(d).Table 2 summarizes the 99% Bayesian Credible intervals of the posterior distributions of the parameters of the piecewise linear regression model.

DISCUSSION
A piecewise linear regression model was developed which related peak demand to temperature and the day of the week.This study has shown that aBayesian analysis provides a way of taking into account uncertainty in the estimation of the parameters.Uncertainty about the true values of the piecewise linear regression parameter estimates was incorporated into the analysis through the choice of a non-informative prior distribution.
Empirical results show that if temperature decreases by 1°C in the winter sensitive period (that is when temperature is less than 18°C) then the average inc rease in daily electricity load will be 171.47MW.This increase in electricity daily load ranges between 140 and 200 MW.Similarly if temperature increases by 1°C in the su mmer sensitive period (that is when temperature is greater than 22°C) daily electricity load will increase by an av erage of 24.85 MW.This increase will range from -20 to 80MW.These results show that demand of electricity is more sensitive to winter periods than summer periods in South Africa.This quantification of uncertainty about these parameters is important for load forecasters in Eskom as it helps them in the determination of consistent and reliable supply schedules.The developed model can either be used for predicting short-term daily peak demand or for weather normalization, which is estimating daily peak demand under normal weather conditions.The mean values of the parameters of the Bayesian approach are the same to the frequentist analogy because of the nature of the non-informative prior employed.It therefore seems that the frequentist properties of the Bayesian inferences of the model based on the prior are adequate.However the resultant posterior may be used in a way is not always available to frequentist.

Conclusion
The paper discussed the use of time series regression for modelling and predicting daily peak demand in South Africa.The developed piecewise linear regression model captures demand drivers of electricity such as temperature, calendar effects and lagged demand.The model also accounts for residual correlation that may occur as a result of day to day, week to week, monthly and seasonal variations in electricity peak demand.This empirical study provides an extension of point forecasting to density forecasting so as to take into account uncertainty in the estimation of the parameters.
Areas for further study would include use of a full Bayesian MARS model in which we place a prior distribution on the whole MARS model space, treating the number of splines, knot points and all other parameters as unknown.Another interesting area of study would involve the use of robust Bayesian analysis and incorporation of the impact of energy efficiency and demand side management strategies being put in place by Eskom, South Africa's power utility.These areas will be studied in our future research.

Figure
Figure 5. Posterior for parameters .

Figure
respectively, which is the weather neut ral period.Within this range of temperature values residents would neither use a heater nor a cooling system.The model identifies the winter sensitive, weather neutral and summer sensitive periods. .

Table 2 .
99% Bayesian credible intervals for the posterior s of the parameters.