计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (20): 103-105.DOI: 10.3778/j.issn.1002-8331.2010.20.029

• 数据库、信号与信息处理 • 上一篇    下一篇

信息偏差在连续属性离散化中的应用

岳海亮,闫德勤   

  1. 辽宁师范大学 计算机系,辽宁 大连 116081

  • 收稿日期:2009-04-14 修回日期:2009-06-01 出版日期:2010-07-11 发布日期:2010-07-11
  • 通讯作者: 岳海亮

Discretization of continuous attributes using information divergence

YUE Hai-liang,YAN De-qin   

  1. Department of Computer Science,Liaoning Normal University,Dalian,Liaoning 116081,China
  • Received:2009-04-14 Revised:2009-06-01 Online:2010-07-11 Published:2010-07-11
  • Contact: YUE Hai-liang

摘要: 对基于信息论的离散化系列算法进行了分析,在此基础上提出了一种新的连续属性离散化方法。该算法使用信息偏差来对断点重要性进行度量,在离散化过程中使用不一致率进行控制以保证决策表的相容性不发生变化。最后通过使用C4.5和支持向量机(SVM)对该算法和其他算法进行性能对比,验证了该算法的有效性。

关键词: 连续属性离散化, 决策表, 信息偏差, 不一致率

Abstract: The discretization of continuous attributes is always with great contribution to the followed process of machine learning or data mining.A new algorithm based on information divergence for discretization is proposed.By an inconsistency checking,the procedure of discretization is controlled.The experiments are performed respectively with the results of discreted data by using C4.5 and SVM.The results show that the presented algorithm is effective.

Key words: discretization of continuous attributes, decision table, information divergence, inconsistency

中图分类号: