This paper investigates the use of batch and incremental classifiers such as logistic regression, neural networks, C5, Naïve Bayes updateable, IBk (instance-based learner, k nearest neighbour) and raced incremental logit boost to obtain the best classifier to be used for improving the predictive accuracy of consumers’ credit card risk of a bank in Malaysia. Prior to generating all the models for comparison, the initial set of data is also loaded into an ETL (extraction, transformation, loading) system developed to perform feature selection or attribute relevancy analysis using ID3 algorithm, compiling a subset of data with the highest information gain and gain ratio. An extended test is performed to use equal length binning on some attributes to find if it affects the relevancy of each attribute. The selected subset of data of 24 months is used to generate various data mining models using different training and testing sizes and binning sizes. C5 emerged consistently as the technique that have generated the best models with an average predictive accuracy as high as 94.68%. Sample sizes, equal-length binning sizes and training and testing sizes are all shown to have an effect on accuracy in different intensity.
Key words: Data mining techniques, predictive accuracy, incremental learning schemes.
Copyright © 2021 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0