Genetic Programming Approach for Testing Credit Risk Box Method

In this paper, Genetic Programming (GP) technique is applied to the empirical analysis of a new geometric approach of credit risk, financial ratios and bankruptcy prediction. Utilizing financial ratios for prediction of corporate bankruptcy and identification of firms' impending failure is indeed desirable for investors, creditors, borrowing firms, and governments. This paper presents new geometric technique for empirical analysis of credit and bankruptcy risk using financial ratios. Within this framework, we propose the use of a new ratio representation which is named Risk Box measure (RB). We demonstrate the application of this geometric approach for variable representation, data visualization and financial ratios at different stages of corporate bankruptcy prediction models based on financial balance sheet ratios. These stages are the selection of variables (predictors), accuracy of each estimation model and the representation of each model for transformed and common ratios. By the time, several methods have been attempted in the use of financial ratios on predicting bankruptcy but some of them suffer from underlying shortcomings. Recently, Genetic Programming (GP) has received great attention in academic and empirical fields of solving highly complex problems. Results of Genetic Programming (GP) as statistical classification methodology are compared for common and transformed ratios and better accuracy is obtained.


INTRODUCTION
In classical prediction models, a convenient representation of ratios is in closed form of graphical presentation of data.In contrast, achieving better accuracy often relies on visualization of predictors.It is at this stage when the selection of a proper graphical presentation scheme becomes essential for a correct scaled visualization.Since numerical presentation of ratios cannot be a good representative of characteristics of companies, some other ways of displaying them must be found.Graphical tools give this possibility.This paper presents a complementary perspective on the study of ratios and bankruptcy.Since global increases in bankruptcies, accurate predictions of companies' distress and bankruptcies have been extensive.One of the most *Corresponding author.E-mail: alirezab@um.edu.my.
well known anomalies of the risk factors is the effect of some ratios on bankruptcy risk and firm returns.One possible explanation for this effect that is consistent with the "efficient market hypothesis" that ratio is a proxy for risk.Also in banking, the ratios taken to be a proxy for the charter value of banks (Landskroner et al., 2006).Statistical techniques applications to corporate bankruptcy started in the 60's with the development of computers.
The first technique introduced was discriminant analysis (DA) for univariate and multivariate models (Beaver, 1966).Then, Altman (1968), used multiple discriminant analysis (MDA) and applied it to prediction of business failure.Altman (1971) examined railroad bankruptcy propensity and Deakin (1972) replicated the study of Beaver (1966), Edmister (1972) testing the usefulness of financial ratio in order to predict small business failure.Altman et al. (1974) developed a model in order to determine the credit worthiness of commercial loan applicants in the cotton and wool textile sector in France.Altman et al. (1977) developed their classical Z model and named it Zeta Analysis.After DA and Multiple Discriminant Analysis (MDA), the Logit and Probit models were introduced by Martin (1977) and Ohelson (1980).Nowadays, these models are widely used in practice.The solution in the traditional framework is a linear function separating successful and failing companies.A company score is computed as a value of that function (Aziz and Dar, 2006;Chauhan et al., 2009).Northon and Smith (1979) who compared the prediction of bankruptcy using ratios computed from General Price Level (GPL) financial statements to the prediction of bankruptcy, using ratios computed from traditional historical cost financial statements, Taffler (1982) who used linear discriminant analysis for the prediction of bankruptcies in UK with financial ratios.Moreover, recursive partitioning, also known as classification and regression trees (CART) performs classification by dividing the data space.
Moreover, Genetic Programming (GP) is a population of linear classifiers (genes) that are connected with one another in a pre-specified way.The outputs of some of the genes are inputs for others.The performance of GP greatly depends on its structure that must be adapted for solving different problems.However, as there is no widely accepted economic theory, every study has based their model specification on an empirical framework.This results in different accounting ratios used in different models.Generally, these multivariate models are conducted on procedure that is structured in such a way that an equal number of bankrupt and non-bankrupt firms are chosen randomly with respect to company size or industry or large and small samples avoiding matching procedure.According to literature, predictors used in various studies, generally exhibit non-normal distribution and high standard errors (Barnes, 1982;McLeay and Omar, 2000;Ooghe et al., 2005;Andres et al., 2005).Some researchers made correction for univariate nonnormality and tried to approximate univariate normality by transforming the variables prior to estimation of their model 1 .Deakin (1976) used log transformation, then square root and log-normal transformation of financial ratios were used by Foster (1986) and Gu (2002).Other researchers approximate univariate normality by 'trimming' or 'outlier deletion', which involves segregating outliers by reference to normal distribution (Ezzamel et al., 1987).Furthermore, rank transformation has been used by Perry et al. (1986) and Kane et al. (1996).Recently, Bahiraie et al. (2008) used geometric transformation of ratios, which may become general 1 For more details about financial ratios properties, Watson (1990) and Tippett (1990). 2 Some exposition on some of the weaknesses in the use of common ratios such as scaling, proportionality and symmetric effects are provided in Bahiraie et al. (2008) and, Azhar and Elliott (2006).Bahiraie et al. 3585 guidelines concerning the transformation details was discussed.
Our objective in this paper is to discuss about new geometric approach to ratios, which involves data transformation and we illustrate the use of this methodology for bankruptcy predictions.For illustration of this new methodology, numerator and denominator of common ratio values are represented as Cartesian coordinates in our constructed modification box in which we derive the isoclines of associated components of bankruptcy and credit risk.This study is regarded as one of the fundamental studies in this field.

Genetic programming (GP)
Genetic programming (GP) is a search methodology belonging to the family of evolutionary computation (EC).GP can be considered as an extension of genetic algorithms, GA (Koza, 1992).GA is stochastic search techniques that can search large and complicated spaces stemmed on the ideas from natural genetics and evolutionary principle.They have been demonstrated to be effective and robust in searching very large spaces in a wide range of applications.GP is basically a GA applied to a population of computer programs (CP).While a GA usually operates on strings of numbers, a GP has to operate on CP.GP allows, in comparison with GA, the optimization of much more complicated structures and can therefore be applied to a greater diversity of problems (Nwogugu, 2006).While bankruptcy prediction can be considered as a classification problem, we provide necessary description of GP with emphasis on its application in classification role (Koza, 1992).Genetic programming models were inspired by the Darwinian theory of evolution.According to the most common implementations, a population of candidate solutions is maintained, and after a generation is accomplished, the population is fitted better for a given problem.Genetic programming uses tree-like individuals that can represent mathematical expressions.Three genetic operators are mostly used in these algorithms: 1) reproduction, 2) crossover, and 3) mutation.
First, the reproduction operator simply chooses an individual in the current population and copies it without changes into the new population.Reproduction stands for selection of two parents and production of two new children and including them to population.Our study demonstrates the use of standardized quantitative gene expression values from primary individual indicator and data set from different companies' balance sheets as inputs in a genetic programming system to generate classifier rules.In second step, two parent individuals are selected and a sub-tree is picked on each one.Then, crossover swaps the nodes and their relative subtrees from one parent to the other.If a condition is violated, the toolarge offspring is simply replaced by one of the parents.There are other parameters that specify the frequency with which internal or external points are selected as crossover points (Ravikumar and Ravi, 2007).
The mutation operator can be applied to either a function node or a terminal node which in the tree is randomly selected.If the chosen node is a terminal node it is simply replaced by another terminal and if it is a function and point mutation is to be performed, it is replaced by a new function with the same parity (Lee, 2004).When tree mutation is to be carried out, a new function node is chosen, and the original node together with its relative sub-tree is substituted by a new randomly generated sub-tree.A depth ramp is used to set bounds on size when generating the replacement subtree.Naturally, it is to check that this replacement does not violate the depth limit.If this happens mutation just reproduces the original tree into the new generation.Further parameters specify the probability with which internal or external points are selected as mutation points.The major steps of this genetic programming can be formalized as follows: The last step for obtaining the best fitness function for all classification problems, in order to apply a particular fitness function, the learning algorithms must convert the value returned by the evolved model into "1" or "0" using the 0/1 rounding threshold.If the value returned by the evolved model is equal to or greater than the rounding threshold, then the record is classified as "1"and "0" otherwise.There are many varieties of fitness function such as number of hits, sensitivity/specificity, relative squared error (RSE), mean squared error (MSE), that can be applied for evaluating performance of generated classification rules.
We used "number of hits" as fitness function because of its simplicity and efficiency which is based on the number of samples in the search space of all the possible solutions, that is to say an optimization of the fitness function for which optimization techniques can be used.The implementation of a genetic model is to automatically extract an intelligible classification rule for prediction classes of bankrupt and non-bankrupt firms in a sample by the given values of some financial ratios, called predicting variables.Each rule is constituted by a logical combination of these ratios.The combination determines a class description which is used to construct the classification rule.Given a number of variables describing each firm and their related domains, it is easy to understand bankruptcy prediction problems by the number of possible solutions obtained which is enormous.

The risk box
Here, we will present more details of the existing new graphical technique for ratio transformation.The framework is a twodimensional box in which pair values of risk ratios numerator (X) and denominator (Y) are represented as Cartesian coordinates.Assume a hypothetical study of risk covering n years for sector j2 .For expositional purposes suppose our proxy for risk chosen is employed by (X) and (Y) values.
and lastly the proposed Share Measure of Risk (SR) as we define below, are linear functions of X and Y.

X Y TR NR OR
Following Bahiraie et al. (2008), we can construct a two dimensional box that encapsulates all of these variables for n years.The dimensions of the risk box are generated by the maximum value of either the X or Y value during the period of study.From the definition of TR, NR, OR, SR, we obtain: ( ) max( ) max max(max min ,max min ) Each respective risk box will have sides equal to max( )  Hence, locus of equi TR is perpendicular to the axis of symmetry (Figure 2).With respect to the vertical intercepts in Figure 2,

X Y
Comparing with y mx c = + , we have for a net book value, 1 m = with a vertical intercept c NR = . Since the central line balanced is the axis of symmetry for NR, 1 m = and c NR = (see Figure 3).Consequently, locus of equi NR values is perpendicular to lines of equi TR (that is

Locus of Equi OR
Recall that Overlapping Risk From (Figure 4), we have the following; Proposed share measure of risk and locus of Equi SR Consider our proposed unit-free share measure of risk ( ) , the followings are obtained: . The equi corresponding to a constant value * SR is defined by the relation , which can be solved for X to yield

EMPIRICAL RESULTS
The database used in our illustrative empirical study consists of 200 Malaysian companies from Kuala Lumpur Stock Exchange (KLSE) which 60 companies went bankrupt under section 167 of Malaysian companies law act 1965 which a firm is bankrupt when its total value of retained earning is equal or greater than 50% of its listed capital and 140 companies are "matched" companies from the same period of listing.In this study on the basis of the financial ratios successfully identified by past studies and 40 indices have been built by using balance-sheet data.Ratios and significances on mean differences for each group is tested and presented in Table 1.Since normality of the data is not met, we employ a non-parametric method to examine the disparity in financial ratios between bankrupt and non-bankrupt groups.To the difference in means of a variable between two groups, this study chooses the Mann-Whitney test.

Bahiraie et al. 3589
From Table 1, Mann-Whitney test shows whether there is a significant mean difference between groups for each specific variable.These indices reflect different aspects of firm structure and performance: liquidity, turnover, operating structure and efficiency, capitalization and finally profitability.The indices have been calculated as one-year ratios prior to bankruptcy.Subsequent testing of variable selection using Genetic Programming (GP) is done to illustrate that this new transformation will produce more accurate prediction statistically and can be used as an alternative for common ratios.

Genetic programming
Following recent research by Etemadi et al. (2008) we tested these selected variables with Genetic Programming (GP) to obtain fitness function tree and to illustrate that this new transformation will predict more accurate and can be used as an alternative for common ratios even with GP.In the final regressions with fewer significant variables in different classification trees where as expected and we observed that different variables were identified as significant indicators for each procedure from the selected list.For implementing GP process and developing bankruptcy model, newly 2008, released software GeneXproTools software version 4.1 with C# output was used.Crossover and mutation operators were set as 0.44 and 0.05 respectively.Tables 2 and 3 show the best GP model obtained for each approach.These models have been divided in three subtrees which each tree representing a Gene meaning the model is a chromosome consisting of tree genes.Sum of the returns of sub-trees for a firm should be compared with "Rounding Threshold" for determining the class of the firm.From the classification sub-trees depicted in Table 2, decision trees for SR approach with 95 % accuracy rate obtained.
From the classification sub-trees in Table 3, decision trees for common ratios approach with 89 % accuracy level.For decision making of whether a firm is bankrupt or non-bankrupt through the genetic programming decision tree, a benchmark value of 0.5 is used.If the value for specific training or test firm is greater or equals 0.5, then this firm is marked as "bankrupt firm".If the value of the GP model for a training or test firm is less than 0.5, then this firm is classified as "non-bankrupt firm".Comparison of real class of firms with predicted class by the GP model will determine the accuracy of the model as reported in Table 4.

Misclassification error
An alternative to error rate is a misclassification error which is simply a number that is assigned as a penalty for  double dblTemp = 0.0; return (dblTemp >= ROUNDING_THRESHOLD ? 1 : 0); making a particular type of a mistake.An average rate of misclassification can be obtained by weighing each of the errors by the respective error rate.In Table 4 possible classifications and misclassifications are shown and Table 5 shows the comparison accuracy by each classification model respect to different data representations.Table 5, exhibits the summarized accuracy level for GP procedures and clearly the results improved under data transformation procedure.Due to better performance observation of this new transformation, data set is not collected form particular industry type or similar firm size or any outlier deletion applied.Thus, our process is free of any potential explanatory effect errors, which may caused by independent variable's distribution4 .The new model properties are briefly explained in methodology.
4 Deakin (1967) found that financial ratios might be more normally distributed within a specific industry groups.
Operating structure ratios for active companies have a lower incidence of interest charges on sales and value added, and higher depreciation charges over gross fixed assets for failed ones.Results suggest that some indicators like earnings to total debt traditionally considered in the empirical analysis but is not being significant in each of the three considered models.Profitability ratios emphasize the overall higher profitability of active enterprises.Finally, additional indices such as market share holders' dividend, sale, return and operating assets are significantly higher for healthy companies.

Conclusions
In this paper we demonstrated the application of new graphical geometric approach for variable representation, data visualization and financial ratios.We believe that graphical analysis will have an increased importance as becoming more and more popular.On the other hand graphical ratio representation can facilitate the acceptance of prediction models in various areas, e.g.finance, medicine, sound and image processing, etc.This will contribute to the development of those areas since better represent reality and provide higher forecasting accuracy.Within our new transformation methodology each company is described by a set of variables i X , such transformed financial ratios instead of original ratios.Financial ratios, such as debt ratio (leverage) or interest coverage (earnings before interest and taxes) characterize different sides of company operation.They are constructed on the basis of balance sheets and income statements.We used 40 ratios (predictors) computed using the company statements from their corporate bankruptcy data base.The predictors and basic statistics are given in Table 1.
Initially, an unknown classifier function : f x y → is estimated on a training set of companies ( , ) i i x y , 1, 2,..., i n = . The training sample classification regression represents prediction for companies which are unknown to be survived or gone bankrupt for testing sample.As was demonstrated, this paper presented a complementary perspective on the study of risk and bankruptcy with use of financial ratios.In this paper, a new dimension to risk measurement, bankruptcy, and ratio transformation with the advent of the share risk was proposed.We briefly derived the respective properties of new risk approach components of which were over come of using common ratios limitations.Our simple methodology, called Risk Box index, provided a geometric illustration of our new proposed risk measure and transformation behavior.Our study employed 60 distressed companies with matched sample of another 140 non-failed companies listed in Kuala Lumpur Stock Exchange (KLSE).We found a rise in classification accuracy on application of this new independent variables transformation using Genetic Programming (GP).
According to provided properties of new Share Risk method discussed in section 3 and better numerical results in compare to other studies in section 4, it is strongly suggested the use of this new methodology for ratio analysis, which provided a conceptual and complimentary methodological solution to many of problems associated with the use of ratios.Alternatively, the Share Risk model (Risk Box) can be employed as a tool of analysis in providing a crucial first stage for analysing studies associated with changes in risk patterns, in particular those assumed to be linked with potential bankruptcies.The adaptability of our proposed methodology is emphasised by its applicability for any number of years on sectoral or cross-country studies on risk and bankruptcy studies.
Since previous studies used one and two year prior to bankruptcy, consequently, generalize ability of model with expansion for an additional year is recommended for further studies.Furthermore, as reported by IMF, to undertake such research to understand the capital structures and other financial indicators such as macro and micro economic variables simultaneously that might be effect on firms' performance and eventually can improve prediction is necessitate, therefore testing above model respect to this issue will be important to be continued.

Figure 1 .
Figure 1.Overview of genetic programming process.
have the gradient m equals minus unity.
4), the kink occurring along the central 45 o line.As * OR increases, the kink moves up the line, away from the origin.
otherwise.Our exposition of the dimensions of the box is as follows which confirms the elasticity and unit-free nature of SR measure.A 45 o line from the origin bisects the box into two equal triangles and will be measure of tanθ .This positive slope diagonal (as demonstrated in next sections) is the locus of balanced risk where i YX > in the net X value (NX) plane and points

Table 1 .
Variables used and comparison of means in two groups.

Table 2 .
The best GP model obtained for RB method.

Table 3 .
The best GP model obtained for common ratios.

Table 5 .
Comparison accuracy of GP trees.

Table 6
represents the comparison of 5-fold accuracy results.

Table 6 .
The transformed ratios still outperform original ratios.