Scientific Research and Essays

  • Abbreviation: Sci. Res. Essays
  • Language: English
  • ISSN: 1992-2248
  • DOI: 10.5897/SRE
  • Start Year: 2006
  • Published Articles: 2768

A new algorithm for knowledge discovery from data sets using cross-entropy measurement

Ömer AKGÖBEK
Department of Industrial Engineering, Engineering Faculty, Zirve University, 27260, Gaziantep, Turkey.
Email: [email protected]

  •  Accepted: 16 August 2011
  •  Published: 19 September 2011

Abstract

This study suggests a new method for selecting attributes in algorithms used for generating rules for data mining. The most common measure resorted for selection of attribute is entropy. Entropy is defined as a measure of uncertainty. According to this, the entropy of a system is higher as the uncertainty in the system. Usually the entropy is used to measure uncertainty of C4.5, CN2, CART etc. Attributes in data mining and the cross-entropy is not used frequently. Therefore a new algorithm named REX-1C is derived from REX-1 algorithm that uses entropy in order to test effects of cross-entropy on the learning phenomenon (by using accuracy and rule number). Twenty data sets of different specifications and sizes which are commonly used in the machine learning field and sampled from real life were chosen to test the success of said algorithm. Using those data sets, effects of norms on accuracy of the algorithm and number of rules it produces were calculated and results were compared to Rules-3 Plus, Rules-6, REX-1 and C5.0 algorithms. According to the results achieved, it was observed that REX-1C algorithm produced better results compared to Rules-3 Plus, Rules-6, REX-1 and C5.0 algorithms in respect to accuracy.

 

Key words: Data mining, entropy, cross-entropy, classification, rule extraction.