Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (20): 103-105.DOI: 10.3778/j.issn.1002-8331.2010.20.029

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Discretization of continuous attributes using information divergence

YUE Hai-liang,YAN De-qin   

  1. Department of Computer Science,Liaoning Normal University,Dalian,Liaoning 116081,China
  • Received:2009-04-14 Revised:2009-06-01 Online:2010-07-11 Published:2010-07-11
  • Contact: YUE Hai-liang

信息偏差在连续属性离散化中的应用

岳海亮,闫德勤   

  1. 辽宁师范大学 计算机系,辽宁 大连 116081

  • 通讯作者: 岳海亮

Abstract: The discretization of continuous attributes is always with great contribution to the followed process of machine learning or data mining.A new algorithm based on information divergence for discretization is proposed.By an inconsistency checking,the procedure of discretization is controlled.The experiments are performed respectively with the results of discreted data by using C4.5 and SVM.The results show that the presented algorithm is effective.

Key words: discretization of continuous attributes, decision table, information divergence, inconsistency

摘要: 对基于信息论的离散化系列算法进行了分析,在此基础上提出了一种新的连续属性离散化方法。该算法使用信息偏差来对断点重要性进行度量,在离散化过程中使用不一致率进行控制以保证决策表的相容性不发生变化。最后通过使用C4.5和支持向量机(SVM)对该算法和其他算法进行性能对比,验证了该算法的有效性。

关键词: 连续属性离散化, 决策表, 信息偏差, 不一致率

CLC Number: