Optimizing the monthly crude oil price forecasting accuracy via bagging ensemble models

The study investigates the accuracy of bagging ensemble models (i.e., bagged artificial neural networks (BANN) and bagged regression trees (BRT)) in monthly crude oil price forecasting. Two ensemble models are obtained by coupling bagging and two simple machine learning models (i.e., artificial neural networks (ANN) and classification and regression trees (CART)) and results are compared with those of the single ANN and CART models. Analytical results suggest that ANN based models (ANN & BANN) are superior to tree-based models (RT & BRT) and the bagging ensemble method could optimize the forecast accuracy of the both single ANN and CART models in monthly crude oil price forecasting.


INTRODUCTION
Oil is an important component of the economic activity and the adverse effect of the crude oil prices on the level of the output is widely recognized in numerous empirical studies (Hamilton, 1983;Hamilton and Herrera, 2004;Huntington, 2005;Barsky and Kilian, 2004;Kilian, 2008).Therefore, forecasting crude oil prices is a very important topic, although it is an extremely hard one due to its intrinsic difficulty and practical applications.The supply and demand forces which are influenced by factors like gross domestic product, stock market activities, foreign exchange rates, weather conditions and political events determine the crude oil prices (Bernabe et al., 2004;Yousefi and Wirjanto, 2004).These factors among others may cause the highly nonlinear and chaotic tendency of the crude oil prices (Yang et al., 2002).
In the past decades, traditional statistical and econometric techniques have been widely applied to crude oil price forecasting.Abramson and Finizza (1991) *Corresponding author.E-mail: hacera@akdeniz.edu.tr.
Authors agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License utilized a probabilistic model for predicting oil prices.Gulen (1998) attempted to predict the West Texas Intermediate (WTI) price using co-integration analysis.Morana (2001) offered a semiparametric statistical method based on the GARCH properties of crude oil price.Similarly, the GARCH model was used by Morana (2001) to forecast short-term oil prices.Ye et al. (2002Ye et al. ( , 2005Ye et al. ( , 2006) ) presented a single equation model to forecast short-run WTI crude oil prices, using OECD petroleum inventory levels, relative inventories, and high and low-inventory variables.Lanza et al. (2005) used error correction models to predict oil prices.Xie et al. (2006) employed a linear ARIMA model to forecast crude oil prices, argued that oil prices exhibit nonlinear behavior which cannot be captured by linear techniques.
As the traditional and econometric models have some limitations, some non-linear and emerging artificial intelligent models like artificial neural networks (ANN), support vector machines (SVM) and genetic programming (GP) can provide powerful solutions to nonlinear crude oil prediction.Abramson and Finizza (1991) attempted to predict crude oil prices using neural network models.Tang and Hammoudeh (2002) used a non-linear regression model to forecast OPEC basket price.Mirmirani and Li (2004) applied the VAR and ANN techniques to make ex-post forecast of U.S. oil price movements.Their analysis suggests that the BPN-GA model noticeably outperforms the VAR model.Xie et al. (2006) proposed a support vector machine model to forecast WTI prices.To evaluate the forecasting ability of SVM, authors compared its performance with those of ARIMA and BPNN.The experiment results showed that SVM outperforms the other two methods.Shambora and Rossiter (2007) and Yu et al. (2007) also used the ANN model to predict crude oil price.Gori et al. (2007) forecasted oil prices and consumption in the short term under three scenarios: parabolic, linear and chaotic behavior.Silva et al. (2010) used a wavelet decomposition to forecast oil price trends.Azadeh et al. (2010) applied an adaptive intelligent algorithm for forecasting gasoline demand based of artificial neural network (ANN), conventional regression and design of experiment (DOE).
In the recent years, there has been a growing interest in ensemble methods for integrating multiple predictions.To our knowledge there have been very few applications of ensemble models within energy economics.For example, Zhanga et al. (2008) used ensemble empirical mode decomposition (EEMD) for crude oil price analysis.Yu et al. (2008) proposed using an empirical mode decomposition (EMD) based neural network ensemble learning paradigm for crude oil forecasting.Authors found that across different forecasting models, for the two main crude oil prices -WTI crude oil spot price and Brent crude oil spot pricein terms of different criteria, the EMD-based neural network ensemble learning model performs the best.The ensemble methods provide an enhancement of the forecasting accuracy of their individual constituent members such as artificial neural networks and classification and regression trees.The most popular and widely used method is bagging.Thus, we employ bagging in constructing ensemble models in the present study.
The organization of this paper is as follows.Section two is devoted to bagging, classification and regression trees and artificial neural networks.Section tree describes the data, performance statics, application details and empirical results.Finally, some discussions, conclusions and future study directions are given in section four.

Bagging
Bagging (short for bootstrap aggregating) was proposed by Breiman (1996) multiple times, whereas others are not included.When a bootstrapped sample is drawn, approximately 37% of the data is excluded from the sample and the remaining data is replicated to bring the data to full size.The excluded one third of the samples is known as the out of bag samples (OOB), while the replicated dataset is known as the in bag samples (Ismail and Mutanga, 2010).A more detailed version of bagging is described in Breiman (1996).Model structure of bagging ensemble developed in the present study is shown in Figure 1.Given a learning model h , bagging is defined for regression problems as follows (Pino-Mejias et al. 2008): Definition 1. Bagging.

Output:
The bagged predictor is

Classification and Regression Trees
Classification and regression trees (CART) was proposed by   Breiman et al. (1984) which is a nonlinear statistical technique (Cao et al., 2010).The CART method is based on binary recursive partitioning.A node, which is always partitioned into exactly two new nodes, is called a parent node.The new nodes are called child nodes.The method is recursive since the process can be repeated by treating each child node as a parent node (Grunwald et al., 2009).A terminal node is anode that has no child nodes.The main aim of CART is to estimate the response y by selecting some appropriate variables from a large dataset.It works as follows (Hancock et al., 2005): Each node within the tree has a partitioning rule.For regression problems, the partitioning rule is determined through minimization of the relative error statistic (RE): Where x that is used to determine the left and right branches.The partitioning rule that minimizes the RE is then used to construct a node in the tree.In the last decade, CART has gained popularity in machine community.However, CART is very sensitive to small changes in the training dataset.More specifically, minor changes in the values of the training dataset can lead to significant changes in the selection of variables (Hastie et al. 2008;Ismail and Mutanga, 2010).Thus, CART is identified as unstable predictor that is prone to overfitting (Breiman, 1996).A CART structure is depicted in Figure 2.

Artificial Neural Networks
This study uses a multilayer perceptron (MLP) which is a conventional back-propagation artificial neural network.Backpropagation process is applied in two phases.The first phase is the forward phase; it involves feeding an input data to the input layer and propagating the signal as far as the output of the network to obtain the prediction.Next, the second phase is the backward phase; the error is employed to adjust the weights of the connections from the hidden to the output neurons.The error is also back propagated and used to adjust the weights of the connections from the input to the hidden neurons (Oliveira et al., 2010).The output signal for the l th neuron in the nth layer is given by, and it can be revised as given by where is the learning rate, and For the output layer, the local error gradient is given by where ) (t d j is the goal output signal, and ) (  is the activation function.

Dataset and experimental settings
The data used in this analysis consist of the monthly West Texas Intermediate (WTI) spot price from January 1982 to November 2011 gathered from the Federal Reserve Bank of St. Louis Federal Reserve Economic Data (FRED).There are various data sets for oil price in the literature, but WTI data is most common due to having long period and providing data continuously from FRED.Bagging ensemble model was applied in forecasting prices in the monthly WTI.Prices are forecasted using time as inputs.In this study, the results are obtained by using a 10-fold cross-validation for each model.The 10-fold cross-validation procedure is applied as follows: First, the WTI dataset is randomized and then data are partitioned into three parts as training set (8 distinct folds), cross-validation set (1 fold) and testing set (1 fold).The training set is employed for the model training and the testing set is used to evaluate the accuracy of models.The cross-validation set is used to apply an early stopping process to avoid overfitting of the training data.Data mining toolkit WEKA (Waikato Environment for Knowledge Analysis) version 3.7.4 is used for experiment.WEKA is an open source toolkit, and it consists of a collection of machine learning algorithms for solving data mining problems (Witten and Frank, 2005).
In this study, the model-specific parameter values we use are as follows: the parameters for MLP are: the number of hidden layers is 5 and 10; the learning rate is 0.3, 0.4 and 0.5; the momentum factor was 0.3, 0.4, and 0.5; and the training time is 300, 400 and 500.The experiments indicate that the best MLP parameters are as follows: the number of hidden layers is 5; the number of the learning rate is 0.3; the momentum factor is 0.4; and the training time is 500.The parameters for the CART are the following: number of folds; the minimum total weight; and number of seeds.In this case, the values for these parameters were 2, 2 and 1 for CART respectively.The bagging parameters are the size of each bag (as a percentage); the number of iterations; and the number of seeds.The best configuration parameters for the bagging are 100, 40, and 1 respectively.The base models (i.e., CART, ANN) parameters are identical to the case in which are they are separately applied.In this study, we offer a better forecasting method for oil price, so we run the program for the each parameter values specified above and select giving the best value.We examined the effects of all the model parameters from the highest values to the least that can be applied in a proper way through the method algorithms.The parameter values that give the highest first three ones are selected for further examination and analyzed for the best values through which we can obtain the least prediction error.Prediction results for each parameter values are compared by using the root mean squared error, the mean absolute error, relative absolute error and root relative squared error accuracy measures.

APPLICATION AND EMPIRICAL RESULTS
The predictive models proposed in this study (i.e., ANN, RT, BRT and BANN) are evaluated by using the four accuracy measures (i.e., the root mean squared error RMSE, the mean absolute error MAE, relative absolute error RAE and root relative squared error RRSE) and also six numerical descriptors (maximum, minimum, mean, variance, maximum under-prediction MUP and maximum under-prediction MOP) are computed to investigate the statistical relation between original data and predicted data.
Mean absolute error: Root mean squared error: Relative absolute error: (9) Root relative squared error: (10) where a =actual target a = average and p =predicted   13,193 113,394 33,861 1146,567 3,426 -11,996 WTI t-1, WTI t-2, WTI t-3 13,439 116,530 34,267 1174,234 4,823 -8,860 target.Three input combinations based on preceding monthly crude oil prices are developed to forecast current monthly crude oil price.The input combinations evaluated in the study are; (1) WTIt-1, (2) WTIt-1, WTIt-2 and (3) WTIt-1, WTIt-2, WTIt-3.In all cases, the output is the WTIt for the current month.We purposely do not give the training performance statistics, because good testing accuracy gives no guarantee for a low test error.The performance statistics of ANN and BANN models in the test period are given in Table 1.The table indicates that the BANN model whose inputs are the prices of three previous months (input combination 3) has the best accuracy.It can be seen from Table 1 that the BANN model performs better than the single ANN model from the various performance criteria viewpoints.The table shows that the relative MAE, RMSE, RAE and RRSE differences between the BANN (input combination 3) and ANN (input combination 2) models are 23.514%,23.065%, 2.786% and 2.731% in the test period, respectively.Table 2 summarizes the numerical descriptors (max, min, mean, variance, maximum over prediction and maximum under prediction) for the ANN and BANN models.The numerical descriptors estimated for the ANN and BANN models indicate that the BANN model yields more similar estimates and distributions when compared with the actual WTI data.
Table 3 indicates that the BRT model whose inputs are the prices of two previous months (input combination 2) has the smallest MAE, RMSE, RAE and RRSE in testing period.And it is found that the RT model has the best accuracy for the input combination 3. Compared with the RT models, the BRT models yield better accuracy in monthly crude oil price forecasting.The relative MAE, RMSE, RAE and RRSE differences between the BRT (input combination 2) and RT (input combination 3) models are 14.581, 16.761, 2.403 and 3.918% in the test period, respectively.The numerical descriptors shown in Table 4 for the RT and BRT models show that the BRT model provides more similar estimates and distributions than RT.The BANN, ANN, BRT and RT residuals in test period are shown in Figure 3 for all input combinations respectively.It can be seen from the residuals that BANN approximates the actual values better than the others.The underestimations are obviously seen for the treebased models.
The direct relationship between the MAE, RMSE, RAE and RRSE is very clear according to Tables 1 and 3   ensemble models (i.e., BANN, BRT) seems to be more adequate than the single ANN and RT models for forecasting monthly crude oil prices (Table 4).
The actual and predicted WTI distributions of the input combinations 1, 2 and 3 for testing period are depicted with boxplots presented in Figures 4, 5 and 6.The box height corresponds to the interquartile range, the whiskers depict the 5th and 95th percentiles and the horizontal line is the median.Dots indicate values outside the range and the horizontal line within each boxes indicate the median values.The performance of BANN model was better than the ANN, RT and BRT models when compared to the distribution of the actual WTI data.Moreover the distribution of WTI data predicted by the BANN model is similar to the distribution of actual data and the BANN model did the best job at the capturing the actual data for test phases.

DISCUSSION AND CONCLUSION
Ensemble learning is the supervised learning from the information generated by the base predictors.The main goal is to build an ensemble model that provides base predictor functionality and to increase the accuracy by combining the individual models (Chou et al., 2011).Integrating multiple instances of the same model type can reduce the variance and enhance prediction accuracy (Wang et al., 2009).In the present study, we have investigated the potential use of bagging ensemble models for monthly crude oil price forecasting.The ensemble models (i.e., bagged artificial neural networks BANN, bagged regression trees BRT) are obtained by coupling bagging and two single unstable machine learning model (i.e., ANN, CART).We have also employed the base models ANN and CART as benchmark models and used tree input combination to test proposed predictive models.In general, the bagging method can be very effective procedure when applied to unstable learning algorithms, such as classification and regression trees and artificial neural networks (Mejias et al. 2010).Moreover, bagging ensembles can inherit almost all advantages of their base models while overcoming their primary problem, which is inaccuracy.Breiman (1996) pointed out that the bagged model variance is smaller than or equal to the variance of a simple model (i.e.CART, ANN), leading to increasing prediction accuracy (Louzada et al. 2011).
The obtained results from the study indicate that (i) bagging always provides a considerable enhancement.Bagged models (i.e., BANN, BRT) reduce the mean absolute errors, root mean squared errors, relative absolute errors and root relative squared errors with respect to the single ANN and CART models by 23.514-14.581%,23.065-16.761%,2.786-2.403%and 3.918-2.731%,respectively; (ii) ANN-based predictive models (i.e., BANN, ANN) are found better than tree-based predictive models (i.e., BRT, RT).(iii) BANN model is a promising approach for monthly crude oil price forecasting and finally (iv) the numerical descriptors (maximum, minimum, mean, variance, maximum under-prediction and maximum under-prediction) estimated for the proposed predictive models indicate that the BANN model yields statically similar estimates and distributions when compared with the actual WTI data.In this study, bagging method is used in building ensemble models.The other ensemble models (e.g., boosting, random forest) could be used for construction of ensemble   models.We propose to investigate the usage of other ensemble models for future work.
l y and r y are the left and right branches with L and R observations of y in each, with respective means L y and R y .The decision rule d is a point in some predictor variable weighted.For an nlayer network, the synaptic weight

Figure 3 .
Figure 3. Residuals for the ANN, BANN, RT and BRT models.

Figure 4 .
Figure 4. Box plots of actual and predicted WTI distributions for input combination 1.

Figure 5 .
Figure 5. Box plots of actual and predicted WTI distributions for input combination 2.

Figure 6 .
Figure 6.Box plots of actual and predicted WTI distributions for input combination 3.

Table 1 .
The comparison of performance statics for ANN and BANN models.

Table 2 .
Numerical descriptors for ANN models and actual data.

Table 3 .
The comparison of performance statics for RT and BRT models.

Table 4 .
Numerical descriptors for RT models and actual data.