Forecasting general insurance loss reserves in Egypt

Loss reserve is one of the most important indicators that have many important and strategic decisions applications, such as rate making decisions, underwriting decisions, investment decision and corporate planning. The aim of this study is to identify the reliable time series forecasting model to forecast loss reserve estimates of Egyptian general insurance companies. Exponential smoothing model, Box-Jenkins analysis and time series regression model are applied on actual reported loss reserves data for general insurance sector for the period 1986 to 2006 and their accuracy are compared based on several error measures. The series from 1986 to 2001 are used for the estimations process and the remaining observations are used to evaluate the models as outside sample data. Exponential smoothing technique in all steps-ahead is identified as the best forecasting technique to Egyptian general insurance sector.


INTRODUCTION
One of the major tasks that insurance companies routinely perform is loss reserves estimation.Loss reserves, also referred to as claim reserves, are estimates of what claims will cost.The reserve represents money that is set aside to meet claims arising in the future on the policies already written (England and Verrall, 2002).In other words, the claim reserving process is a prospective estimation of what claims will cost to the insurance company.
The principal aim of reserving estimation is to insure the protection of policyholders (Cheung, 1997).Apart from that, loss reserving has important implications for insurers pricing and competitive responses.If the estimate of loss reserves were too low, premiums would be inadequate to support the financial projections of future periods.In the worst case scenario, rate would be insufficient to pay claims and the company would be insolvent, if the loss estimates were too high, consequently, insurance rates may be raised above competitive levels (Douglas, 1999;Calandro and O'Brien, 2004).Furthermore, general insurers need to be able to estimate loss reserves to make sure that they have sufficient assets to cover their liabilities.Insolvent insurance companies are *Corresponding author.E-mail: yibrahim@uum.edu.my.Fax: 006049286406.
not allowed to continue to sell insurance policies because it does not have the financial strength to keep its contractual obligations to its policyholders (Cheung, 1997).
Loss reserves estimation are also important for outsiders in determining the financial liability, financial strength and value of an insurance company (Cheung, 1997).Loss reserves are collectively the largest liability on a property and liability insurer's balance sheet, thus any under-reserving of losses will decrease an insurer's liabilities and boost its net income (Grace and Laverty, 2006).
Loss reserves in general insurance market are estimated based on actuarial reserving techniques such chain ladder technique (CLT) (Harnek, 1966), Bornhuetter Ferguson technique (BFT) (Bornhuetter and Ferguson, 1972) and Taylor separation technique (TST) (Taylor, 1977).Nevertheless, data required by all these reserve estimation techniques is complex and ambiguous, where it is assumed that the claims paid can be grouped according to accident years and development years, resulting a claim run-off triangle (Panning, 2006).
Considering the importance of loss reserve figures for future planning and decision-making and the complexity and ambiguity of actuarial loss reserve calculation, it is beneficial to explore non-actuarial approaches to estimating loss reserves; one of them is the time series forecasting technique.Hence, this study aims is to compare the accuracy of several time series forecasting techniques, namely exponential smoothing model, Box-Jenkins analysis and time series regression model, in order to identify the best forecasting model.To the authors' knowledge, there are only a few studies that explore the utilization of time series forecasting model in reserves estimation.

PRIOR RELATED STUDIES
Studies that utilize time series technique to forecast loss reserve is scarce.Harbey (1995) used multiple regression technique to estimate the outstanding loss reserve in Egyptian general insurance market within the period from 1980 to 1994.He found that outstanding claim reserve estimation is affected by two variables, namely, underwriting premiums and paid claims, Harbey (1995) reached the possibility of applying the multiple regression models to some general insurance segments and could not be applied to others.Doray (1996) considered the problem of forecasting the number of claims incurred.After subtracting the number of claims reported to date, the number of claims incurred but not reported (IBNR), which is the main portion of loss reserves, can be forecast.The basic model assumes that the number of claims per accident period follows an autoregressive moving average time series process.Instead of assuming the data available in the usual claim run-off triangle format, he assumed that the only data available are the number of claims reported at the valuetion date for each accident interval of an observation period.Box-Jenkins methods are used to forecast the ultimate number of claims incurred and to obtain approximate confidence intervals for the number of claims incurred.Chan et al. (2004) used the growing triangle technique (GT) for comparing between different models for loss reserve estimation including CLT which based on weighted average mean square error, CLT based on simple average mean, Bornhuetter-Ferguson, and lognormal model.At the base of GT technique, sub-triangle of losses, embedded in the full triangle of available data, are used to evaluate the prediction power of various candidate methods of estimation.The GT technique is illustrated using three data sets: i) paid losses during 1978 to 1995 ii) payments per claim during 1981 to 1995 iii) number of claims notified during 1985 to 1995.Based on weighted mean square error (WMSE) for the three run-off triangle, Chan et al. (2004) found that CLT based on the simple average provides the best prediction for the second and the third data sets.Also, Chan et al. (2004) found that the CLT technique produces relative constant results in the growing triangle assessment, this is explained by the relative simple and non-parametric nature.On the other hand, he found that the log-normal model is the best model for the first data set.show that the variation in the error rates are very high and it is difficult to tell in advance if the prediction is going to be accurate.They also found that the model based on time series analysis using the Holt-Winters seasonal algorithm which is designed to handle both trend and seasonal variation in the data is performed well on many data sets by decreasing the error rates.They found that many techniques did not estimate dental insurance reserve with constant accuracy, some of suggested techniques performed well for certain data sets and performed poorly on others.They noticed that some of these models such as CLT and time series required 24 to 36 months of historical data while log-linear regression model required only 13 months data.So, for new insurer's, modeling with time series may not be possible.They concluded that there seems to be no model that will be both accurate and constant for all data sets.

Forecasting models
Three time series methods are used to estimate loss reserves.These model forecasts are compared to the actual values of the data, and the forecasting models are ranked and compared depending on their standard error.

Exponential smoothing technique
This technique is a most widely used class of univariate time series modeling technique and is a very popular scheme to produce a smoothed time series (Lazim, 2005).Whereas in moving averages, the past observations are weighted equally, exponential smoothing assigns exponentially decreasing weights as the observation gets older.In other words, recent observations are given relatively more weight in forecasting than the older observations.In exponential smoothing models, the forecast for the next and all subsequent periods are determined by adjusting the current period forecast by apportion of the difference between the current forecast and the current actual value.In Brown's method, only one smoothing constant is used and as such, the estimated linear trend values obtained are sensitive to random influences.To overcome this problem, the study uses the Holt's method, which is a technique frequently used to handle data with linear trend.This method does not only smoothen the trend and the slope directly by using different smoothing constants, but also provides more flexibility in selecting the rates at which the trend and slopes are tracked.The application of Holt's method requires three equations: the exponentially smoothed series: 1 1 (1 )( ) The trend estimate, the trend estimate is calculated by taking the difference between two successive exponential smoothed values(St -St-1): A second smoothing constant β is used to smooth the trend estimate.In this equation, the estimate for the trend (St -St-1) is multiplied by the smoothing constant β.This value is then incorporated into the previous estimate of the trend that has been adjusted by the factor (1-β).The smoothing is done for the trend rather than for the actual data, this result in a smoothed trend without any randomness: The values of α and β are the parameters to be determined with values ranging from 0 to 1.

BOX-Jenkins model (ARIMA)
The ARIMA stands for the combination that comprises of autoregressive/integrated/moving average models.It is commonly applied to time series analysis, forecasting and control.
The basis of the Box-Jenkins / (ARIMA) modeling approach consists of three main stages: model identification; model estimation and validation; model application.

Model identification:
The first step in the application of the Box-Jenkins methodology is to identify the class of model most suitable to be applied to the given data set.This is done by computing, analyzing and plotting various statistics based on historical data.
Common statistics used to identify the model type is the autocorrelation function (ACF), and the partial auto-correlation function (PACF).Hence, the common practice now is to identify several highly likely model formulations and subsequently choose the best model that meets all statistical requirements.The process of identifying the models is, thus, summarized as follows: 1. To compute and analyze the various statistics based on the historical data, in particular the ACF and the PACF.2. Based on information obtained from 1 above, the most suitable subclass of the general class of model is then identified.

Model estimation and validation:
The specific parameter values are estimated subject to the condition that the selected error measure is minimized.More specifically, the process is to search for the estimated parameter values that minimize the differences between the actual and the forecast values.
Model application: If all test criteria are met and the model's fitness has been confirmed, it is then ready to be used to generate the forecasts values.At this stage, three possibilities may occur; new or latest data are collected and incorporated into the existing Taha et al. 8963 series; new model is formulated and re-estimated; develop a system to monitor the forecast values produced.

Assumption of Box-Jenkins methodology:
The application of the Box-Jenkins methodology lies on the assumption that concerns the characteristic of the initial data series (Lazim, 2005).Basically, it is assumed that the data series is stationary.Where such assumption is not met, then the necessary procedures are performed in order to achieve stationary in the series.A simple procedure used to remove the non-stationary in time series is to perform the differencing, and log transformation is commonly used to stabilize the variance.
There are four basic models: i.The autoregressive (AR) model, ii.The moving average (MA) model, and iii.Mixed autoregressive and moving average model.iv.Mixed autoregressive, Integrated and moving average model.

i. The autoregressive (AR) model
In the AR model, the current value of the variable is defined as a function of its previous values plus an error term (Lazim, 2005).Mathematically, it is written as: Where  and

ii. The moving average model (MA) model
The moving average is a function of the error terms; the moving average model links the current values of the time series to random errors that have occurred in the previous periods rather than the values of the actual series themselves.The moving average model can be written as: Where  is the mean about which the series fluctuate, ' s  are the moving average parameters to be estimated, and are the error terms (q=1, 2, 3,……..) assumed to be independently distributed over the time.

iii. Mixed autoregressive and moving average model
Under the assumption of stationary, the mixed autoregressive and moving average model of Box-Jenkins methodology is known ARMA model.In other words, the series is assumed stationary (no need for differencing) and the model is written as: Since the AR and the MA models are of order 'p' and 'q' respectively, the model is referred to as ARMA (p, q).

iv. Mixed autoregressive, integrated and moving average (ARIMA) model
When the data is not stationary, then the Box-Jenkins (ARIMA) methodology is represented as ARIMA (p, d, q), where d denotes the degree of differencing involved to achieve stationarity in the series.

Time series regression model
The time series regression models relate the variable t y to a function of time.Time series t y can be described using a trend model; such model is defined as follow: Where, t y : The value of the time series in period t; TR : The trend in the time period t; t  : The error term in the time period.
Any one value of the error term is statistically independent of any other value of the error.In other words, serial correlation problem must not exist.To correct the serial correlation problem, Cochrane-Orcutt procedure is applied to produce better estimates.F-test statistic is used to test the overall fitness of the model while t-test is used to identify which variables to be included in the final model.Adjusted R-squared is then used to measure the goodness of fit of the model.

Models evaluation
The accuracy of model's performance is measured by the size of the forecast error.The operational meaning of an error is defined as the difference between the actual value and the fitted value generated from a given model by e y y   .So, the best method will give us the smallest error value.

Error measures
This study uses four measures of error that are commonly discussed and utilized by researchers and practitioners, namely, mean squared error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE) and geometric root mean squared error (GRMSE) (Armstrong and Collopy, 1992;Lazim, 2005).

Mean squared error (MSE)
For which e y y   where y is the actual observation and y  is the estimated value.Its strength lies in its mathematical simplicity, that is, it is easy to understand and to calculate, it has the tendency to penalize large forecast errors more severely than other common accuracy measures to determine which method avoid large errors.In other words, an incidence of a large error would significantly influence the value of MSE.But the main disadvantage the MSE faces is that it is easily affected by extreme values.

Root mean square error (RMSE)
This is the most favored measure among the practitioners and has even stronger preference among the academics.
The RMSE gives equal weights to all errors; this can also be disadvantageous to a model that has one large forecast error.Thus, when an analyst ranks the forecasting models by RMSE, the presence of one or two extreme errors may alter the ranking of the model.

Mean absolute percentage error (MAPE)
1 ( / ) 100 The main disadvantage of this measure lies in its relevancy as it is valid only for the ratio-scaled data (that data with a meaningful zero).For this reason, MAPE is potentially explosive for large forecast error when the actual observations are close to zero.

Geometric root mean squared error (GRMSE)
The existence of an outlier greatly affects the accuracy of the error measure.But the geometric root mean square error is the most useful alternative in this case.

Data collection
The Egyptian market is the object of this study, and the study will concentrate on loss reserves in general insurance segment within the years 1986/1987 till 2006/2007.Data encompasses of loss reserve value for the total general insurance industry.A total of 21 annual data is used.

FINDINGS
Figure 1 shows a plot of the yearly observations for loss reserves in the Egyptian market from the year 1986/1987 to the year 2006/2007.This plot exhibits a trend pattern because it appears to be an upward trend in the data.
There is also rises and falls in this time series within the period 1995/1996 to 1998/1999 after that it continued to rise steadily until 2006/2007.

Exponential smoothing technique
Based on the Holt method, the best "Alpha, Gamma and Delta" combination are: A = 1.00 alpha, G = 0.00 gamma.The fitted model is presented in Table 1.

BOX-Jenkins model (ARIMA)
Autocorrelation function (ACF) inspection: As shown in Figure 2, the first spike of ACF values is large and it declines speedily to zero, therefore, we can conclude that the series is stationary, and it does not need to do differencing, and the number of significant spikes equal 4, so MA(4) can represent this data series.

Partial autocorrelation function (PACF) inspection:
Figure 3 shows there is significantly large spike followed by smaller other spikes at lags higher than 1.This means that the series is stationary with one significant spike, so, the AR (1) model can represent this data series.
Hence it is concluded that ARIMA (1, 0, 2) model is relatively the best model since it has the smallest AIC and SBC.T-tests executed on ARIMA (1,0,2) model indicate that AR(1) can represent the model at 1% significant level (Table 4).Applying AR (1) forecasting model on the actual loss reserve data resulted in the estimated data ( y  ) in Table 4.

Time series regression
Since the data series have a trend and does not have seasonality (yearly data), the model that better represents the data is: y t = TR + ε t .The linear model that represents the data series is therefore Normality analysis as presented in Table 5 and Figure 4 indicates that the distribution of loss reserve is close to a normal distribution.After correcting for serial correlation problem using Cochrane-Orcutt method, the final estimation model is presented in Table 6.
In summary, based on the  methods for one-step ahead; two-step ahead and threestep ahead.This evaluation to identify which method has a better prediction.This evaluation will be presented in the Tables 11a, b and c.From Table 11a, it is easy to say that the best forecasting technique for 1-step ahead is the exponential smoothing technique.Tables 11b and c shows that the best forecasting technique for 2-step and 3-step ahead is also the exponential smoothing technique.
These comparisons conclude that exponential smoothing technique is the best time series forecasting model for general insurance loss reserve in Egypt.

Conclusion
The aim of this study is to apply time series analysis to forecast the loss reserves in general insurance in the Egyptian market.The forecasted loss reserves are one of the most important indicators that have many important and strategic decisions applications.For all the adopted models, the estimations are done by using the loss reserves data in general insurance in the Egyptian market for the period 1986 to 2006.The series from 1986 to 2001 were used for the estimations process and the other remaining observations (data from 2002 till 2006) are used to evaluate the models as outside sample data.The models are evaluated based on their accuracy in predicting outside sample data.The study concludes that the best model is the exponential smoothing technique in all steps-ahead.

Figure
Figure 2.Technical Reserves in General Insurance in The Egyptian Market

Table 1 .
Loss reserves in general insurance in the Egyptian market.The vertical axis measures the variable loss reserves in the Egyptian market (dependent).The horizontal axis corresponds the time periods (independent).E.P, 000.Fitted model based on exponential smoothing technique.

Table 2 .
AIC and SBC summary table.

Table 3 .
Variables in the model.

Table 4 .
Fitted model based on BOX-Jenkins model. .

Table 6 .
Time series regression models of loss reserve.

Table 7 .
Fitting the model.

Table 10 .
Time series regression.