Enhancing credit scoring model performance by a hybrid scoring matrix

Competition of the consumer credit market in Taiwan has become severe recently. Therefore, most financial institutions actively develop credit scoring models based on assessments of the credit approval of new customers and the credit risk management of existing customers. This study uses a genetic algorithm for feature selection and decision trees for customer segmentation. Moreover, it utilizes logistic regression to build the application and credit bureau scoring models where the two scoring models are combined for constructing the scoring matrix. The scoring matrix undergoes more accurate risk judgment and segmentation to further identify the parts required enhanced management or control within a personal loan portfolio. The analytical results demonstrate that the predictive ability of the scoring matrix outperforms both the application and credit bureau scoring models. Regarding the K-S value, the scoring matrix increases the prediction accuracy compared to the application and credit bureau scoring models by 18.40 and 5.70%, respectively. Regarding the AUC value, the scoring matrix increases the prediction accuracy compared to the application and credit bureau scoring models by 10.90 and 6.40%, respectively. Furthermore, this study applies the scoring matrix to the credit approval decisions for corresponding risk groups to strengthen bank’s risk management practices.


INTRODUCTION
With the rapid growth in the credit industry and the management of large loan portfolios, application and behavioral scoring models have been extensively used for the credit risk evaluation decisions by the finance industry.
Application scoring models help banks determine whether credit should be granted to new applicants based on customer characteristics such as income, education, age, and so on (Akhavein, 2005).Behavioral scoring models help banks predict the probability that existing customer will default or become delinquent based on consumer's repayment and usage behavior (Boyer and Hult, 2005).
In this paper, we utilize a hybrid mining approach in the design of credit scoring models to support credit approval decisions based on the four main steps: (1) using genetic algorithm (GA) to select input features, (2) using decision trees for customer segmentation, (3) using regression (LR) to build the application and credit bureau scoring models based on important input variables of bank's internal application data and credit bureau data, (4) combining the application and credit bureau scoring models to construct the scoring matrix.
These previous studies have focused on creating more accurate classifiers with various hybrid architectures.However, there is scant research on the practical application of combined classifiers because it is difficult to implement functional composition or explain the underlying principle behind the decision of rejecting credit applications when applying the hybrid approach to the banks' risk management practices.
Therefore, this study evolves a GA-based feature selection and a hybrid model that combines two credit risk modeling approaches, the application and credit bureau scoring models to construct the scoring matrix for credit risk management of personal loan customers.The scoring matrix undergoes more accurate risk judgment and segmentation to further identify the parts required to enhance management or control within a personal loan portfolio.Furthermore, this study applies the scoring matrix to the credit approval decisions for corresponding risk groups to strengthen the bank's risk management practices.

SCORING MATRIX DEVELOPMENT
Figure 1 illustrates the development framework used for scoring matrix in this study, where the detailed development process is shown.

Data preprocessing
Data preprocessing is an important step but often neglected in the data mining process.The phrase "Garbage In, Garbage Out" is particularly applicable to the typical data mining projects.Thus, the representation and quality of data is first and foremost before running an analysis (Kotsiantis et al., 2006).
Data preprocessing tasks in this study includes data cleaning, data integration, data transformation, and data reduction.Data cleaning is the process of smoothing noisy data, identifying or removing outliers, and resolving inconsistencies.
Data integration is a procedure to integrate multiple databases or files.In data transformation, the data will be converted into the data mining process.Data cleaning is the process of removing irrelevant attributes from data and reducing attribute number by grouping attributes into intervals (binning).After the preprocessing, the data will be given to the data mining process.Holland (1975) proposed GA as a heuristic combinatorial optimization search technique.Compared to the traditional statistical approach, GA has the advantage of not being bounded by the form of functions.

Feature selection
This study exploits the nature of the GA fitness function to analyze the input variables influencing the personal loan payment status for feature information and then converts the rules of the hidden features of the various variables and transforms them into important values.
The relative important values range from 0 to 1 and they are normalized so that all inputs add up to approximately 1.A variable with greater value means that it is more capable of predicting results.The use of GA as a technique for ranking the importance of variables enables systematic identification of the usefulness of variables and objective ranking of their importance which is very helpful in model input selection namely for eliminating ineffective inputs while saving useful ones (Chi and Tang, 2007).

Customer segmentation
Customer segmentation involves partitioning customers into homogeneous segments based on their behavior, characteristics, and the nature of loan products.Customers belonging to the same segment possess similar risk characteristics or primary risk drivers.
Building individual credit scoring models for each subpopulation will enable us to separate the good customers from the bad customers more accurately than if one credit scoring model is built to handle the whole population.Selected segments should be sufficiently large to enable meaningful sampling for separate credit scoring model development.
Generally, the accuracy of the resulting score increases with the numbers of good and bad available (Mays, 2001).This study uses decision trees for segmentation.Decision trees isolate segments based on performance criteria (that is, differentiate between Goods and Bad), and are simple to understand and interpret.Besides identifying characteristics for use in segmentation, decision trees also identify optimal breakpoints for each characteristic (Ouyang et al., 2011).

Building credit scoring models
The literature has outlined the theoretical background for using LR for classification in credit scoring, and also shows that LR usually performs well in determining good and bad loans in similar tasks to the one examined here (Charitou et al., 2004;Kočenda and Vojtek, 2009).This study uses LR to build application and credit bureau scoring models based on important input variables of bank's internal application data and credit bureau data, respectively.LR uses a set of predictor variables to predict the probability of a binary outcome.The equation for the logit transformation of the probability of an event is as follows: Where p is the posterior probability of Goods, x is the input variables, 0  is the intercept of the regression line, and k  is the parameters.
The LR transformation is the log of the G/B odds and is used to linearize posterior probability and limit estimated probability outcomes in the model to between 0 and 1. Maximum likelihood is used to estimate parameters 1  to k  .These parameter estimates measure the rate of change of logit for one unit change in the input variable, that is, they are the slopes of the regression line between the target and their respective input variables To facilitate the use and interpretation of credit scoring models, credit scores are commonly scaled linearly to take more integer points.This study scales the points such that a total credit score of 300 points corresponds to G/B odds of 1 to 1, and that an increase in the credit score of 20 points corresponds to a doubling of G/B odds.Equations 1 and 2 show the derivation of the scaling rule that transforms the credit scores of each attribute.
Where woe is the weight of evidence for each grouped attribute, β is the regression coefficient for each variable, a is the intercept term from LR, n is the number of variables and k is the number of attributes for each variable.
Owing to the score-to-odds relationship having different meanings in different segments, this investigation applies calibration to standardize the relationship between the score and G/B odds.Credit scores in different segments thus can be compared directly.
To facilitate the use of credit strategies, credit scores are generally divided into different risk ranks according to the degree of risk score.This investigation classifies customers into five risk ranks, ranging from APS1 to APS5 and CBS1 to CBS5, based on application and credit bureau scores, respectively.

Scoring matrix
In this process, the five risk ranks of the personal loan application scoring model and the credit bureau scoring model are combined to construct a 5x5 scoring matrix.
The purpose is to present the possibility of customer payment using objective and specific data.This scoring matrix can better differentiate customer risk and further design corresponding credit strategies.

Credit strategy applications
Banks can design and implement related credit strategies based on the application scoring model they apply to personal loan application process.However, incorporating the credit bureau scoring model into the application scoring model enables banks to undertake more refined risk segmentation.This study applies the scoring matrix in credit approval decision.

METHODLOGY Data collection
The internal application data contains various socio-demographic characteristics and other information collected by a major bank in Taipei, Taiwan.
The sample comprises of 16,040 individual customers who were granted loans during 2009/11/1 to 2010/10/1.The internal application data are incomplete because of a lack of data on interactions between bank's internal customers and other financial institutions, namely the proprietary information resulting from business competition.
To improve the credit scoring model's performance, this study also collects credit-related information on personal loan customers from the public credit registers, as well as collecting bank's internal data.
In this study, the internal application data based on borrower characteristics in addition to credit bureau data.Meanwhile, the credit bureau data comprised five major dimensions which payment history, recent searches for credit, length of credit history, types of

Cumulative percent of sample K -S Statistics (%)
Figure 2. The K-S statistic.
credit used and credit utilization.

Sample
Customers are classified as either good or bad based on their payment performance connected with the loan.Those who are two or more installments in arrears being classified as bad.Of the 16,040 total personal loan customers, 15,532 are good while 508 are bad.Therefore, the G/B odds ratio is 15532/508=30.6.To avoid over fitting the construction model, this study uses the G/B odds ratio of 3 (Chuang and Chen, 2006).That is, 1,492 good are randomly selected and combined with 508 bad to form the development sample.To validate the stability and accuracy of the application scoring model, the data set of 2,000 customers is split into training and testing data sets using a ratio of 8:2 (Lee et al., 2006).

Evaluation of model performance
When a statistical model is used as a predictive tool, doubts can exist regarding the generalization of the model over time and new observations.Several methods exist for measuring the performance of statistical prediction models.
Two of the most widely applied methods are the Kolmorogov-Smirnoff (K-S) statistic and the receiver operating characteristic (ROC) curve analysis.The K-S statistic measures how far apart the cumulative distribution functions of the scores of Good and Bad are.The credit scoring model generating the greatest separability between the two distributions is considered the better model.The equation is as follows:  Figure 2 shows that bad accumulate rapidly at low scores while good accumulate more rapidly at high scores.Additionally, the cumulative distribution function curve of the Goods lies to the right of that of the Bad.
The ROC curve analysis is commonly used for assessing the performance of various classification tools including biological markers, diagnostic tests and binary outcome models (Medema et al., 2009;Yu, 2009).The ROC curve as depicted in Figure 3 is the plot that displays the full picture of trade-off between the percentage of hits (for example, sensitivity) of a credit scoring model on the yaxis against the percentage of false alarms (for example, 1specificity) for all possible classification thresholds.
If high scores are defined to present a low default probability, then x-values represent the error rate with which good are classified as bad using a credit scoring model (for example, Type II error) and y-values represent one minus the error rate with which bad are classified as good using a credit scoring model (for example, Type I error).The ROC curve thus also completely represents Type I and Type II errors.
The area under the ROC curve (AUC) is widely used for assessing the discriminatory ability of a credit scoring model which can be interpreted as the probability that a classier is able to distinguish a randomly chosen good customer from a randomly chosen bad customer.The AUC value is equivalent to both the Gini coefficient (Thomas et al., 2002) and the Wilcoxon-Mann-Whitney test statistic (Hanley and McNeil, 1982).The AUC value ranges from 0.5 to 1, where larger AUC value indicates a more accurate credit scoring model.In most cases where good data is being used, the AUC value exceeding 0.7 represents good discrimination capacity (Cholongitas et al., 2006).

Input feature selection
GA is employed to eliminate ineffective variables according to the importance of input in modeling performance that occurs if a variable is no longer available to the model.The input variables are selected by the rule of important number>0.05(Wang et al., 2010) and the retained variables are then used to construct the credit scoring models.
The importance of each input variable for the application and credit bureau scoring models are listed in Tables 1 and 2 respectively.

APPLICATION SCORING MODEL RESULTS
This study employs LR to build the application scoring model based on important input variables of bank's internal data.Table 3 shows the nine significant variables of the application scoring model, as well as the attributes, G/B odds ratio, and attributes points of each variable.
These variables include age, gender, material status, education, occupation, years of work experience, home ownership, term of loan and loan amount.
The empirical results listed in Table 4 show that the K-S and AUC values of the application scoring model are 29.90 and 70.50%, respectively.To facilitate credit strategy applications, customers are classified into five risk ranks ranging from APS1 to APS5 in accordance with the degree of risk score where APS1 indicates highest risk and APS5 indicates lowest risk.

Credit bureau scoring model results
Because of differences in characteristic behaviors between revolving and transaction customers, this investigation first uses decision trees to partition the customers into two segments according to their payment behaviors.
This study then applies LR to build the revolving and transaction credit bureau scoring models for two segments based on important input variables of credit bureau data respectively.Table 5 indicates the eight significant variables of the revolving credit bureau scoring model, as well as the attributes, G/B odds ratio, and attributes points of each variable.The eight variables are   outstanding amount of cash cards, maximum consecutive months of cash advance more than 0 in the last 12 months, number of credit cards more than 30 days past due in the last 6 months, worst days past due among unsecured products in the last 6 months, number of inquiring banks in the last 3 months, bureau abnormal credit record, average revolving ratio in the last 3 months and average utilization ratio of credit cards in the last 6 months.Table 6 indicates the seven significant variables of the transactor credit bureau scoring model as well as the attributes, G/B odds ratio, and attributes points of each   degree of risk score, where CBS1 has the highest risk and CBS5 has the lowest risk.

Scoring matrix results
After building the application and credit bureau scoring models, this study constructs a 5x5 scoring matrix based on the five risk ranks of the two models.Table 4 shows that the K-S and AUC values of the scoring matrix are 48.30and 81.40%, respectively.

Comparative model performance
Regarding the K-S value, the scoring matrix increases the prediction accuracy compared to both the application and credit bureau scoring models by 18.40 and 5.70%, respectively.Regarding the AUC value, the scoring matrix increases the prediction accuracy compared to both the application and credit bureau scoring models by 10.90 and 6.40%, respectively.
The empirical results demonstrate that the predictive ability of the scoring matrix outperforms both the application and credit bureau scoring models.The scoring matrix thus enables more refined risk segmentation.
To further understand the prediction accuracy of three construction models built on different data sets.Figure 4 to 7 show that the scoring matrix has higher K-S and AUC values in both the training and testing sets than  those of the application and credit bureau scoring models.Additionally, the empirical results listed in Table 4 show that the testing set has slightly lower accuracy than the training set, indicating the scoring matrix is stable.

CREDIT STRATEGY APPLICATION
The analysis results indicate that the K-S and AUC values of the scoring matrix are significantly higher than those of the application and credit bureau scoring models.By applying the scoring matrix to personal loan portfolio management, this investigation classifies 16,040 personal loan customers into the 25 cells thus allowing more accurate segmentation of customer risk.Furthermore, the cell can be grouped into three risk groups based on the G/B odds ratio of each cell.Table 7 lists the details, where cells with the G/B odds ratio below 30 are categorized as the high-risk group.Cells with the G/B odds ratio between 30 and 50 are categorized as the medium-risk group and cells with the G/B odds ratio exceeding 50 are categorized as the low-risk group.The purple zone indicates the high-risk group with an average G/B odds ratio of 11.6.The yellow zone indicates the mid-risk group with an average G/B odds ratio of 38.0 and the green zone indicates the low-risk group with an average G/B odds ratio of 125.4.These three risk groups reveals significant risk segmentation.The bank can then adopt the credit approval decision for different risk groups as listed in Table 8.
Consumers in the low-risk group are viewed as a very low credit risk by the banks.Banks can offer their best rates and terms to borrowers in this group.If customers belong to the medium-risk group, the banks can extend credit but require much higher interest payments to compensate for the increased risk associated with this group.If customers belong to the high-risk group, they are hard to obtain financing by the banks.

Conclusions
This study evolves a GA-based feature selection and a hybrid model that combines two credit risk modeling approaches.The application and credit bureau scoring models to construct the scoring matrix for credit risk management of personal loan customers.
Additionally, this study classifies personal loan portfolio into three risk groups based on the degree of customer risk.Focusing attention on different risk groups makes it possible to design corresponding credit strategies.For model validation, this study applies the K-S statistic and ROC curve to measure the predictability of the credit scoring model.
Regarding the K-S value, the scoring matrix increases the prediction accuracy by 18.40 and 5.70% respectively, compared to the application and credit bureau scoring models.Regarding the AUC value, the scoring matrix increases the prediction accuracy by 10.90 and 6.40% respectively, compared to the application and credit bureau scoring models.Overall, using the scoring matrix can more precisely and efficiently strengthen risk identification, assessment and management, making it an indispensable risk management tool for financial institutions.

Figure 1 .
Figure 1.The development process of scoring matrix. Where are cumulative distribution functions of Good and Bad and s is the corresponding score for the individual loan.

Figure 7 .
Figure 6.K-S for model results (training set).

Table 1 .
Results of the GA for input feature selection of application scoring model.

Table 2 .
Results of the GA for input feature selection of credit bureau scoring model.

Table 3 .
Results of application scoring model.

Table 4 .
Credit scoring results of three construction models.

Table 5 .
Results of revolving credit bureau scoring model.

Table 6 .
Results of transactor credit bureau scoring model.

Table 4 .
Additionally, this study classifies customers into five risk ranks, ranging from CBS1 to CBS5, based on the