Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (12): 153-156.DOI: 10.3778/j.issn.1002-8331.2009.12.050

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Decision tree algorithm using attribute frequency splitting and information entropy discretization

LI Chun-gui,WANG Meng,SUN Zi-guang,WANG Xiao-rong,ZHANG Zeng-fang   

  1. Department of Computer Engineering,Guangxi University of Technology,Liuzhou,Guangxi 545006,China
  • Received:2008-03-27 Revised:2008-06-19 Online:2009-04-21 Published:2009-04-21
  • Contact: LI Chun-gui

属性频率划分和信息熵离散化的决策树算法

李春贵,王 萌,孙自广,王晓荣,张增芳   

  1. 广西工学院 计算机工程系,广西 柳州 545006
  • 通讯作者: 李春贵

Abstract: Decision tree is a usual method of classification in data mining.In the process of constructing a decision tree,the criteria of selecting partition attributes will influence the efficiency of classification.Based on the concept of attributes importance metric that is measured by a function of attribute frequency in Rough Set theory,and which is used to select the partition attribute and pre-prune the decision tree,a new decision tree algorithm is proposed.In addition,using the heuristics information of data set statistic property,a novel algorithm of discretization of numerical attributes based on information entropy is proposed.The results of experiment show that the new discretization algorithm can improve the efficiency of computation,the new decision tree algorithm is simpler in the structure,and has better performances of classification than the entropy-based method.

摘要: 决策树是数据挖掘任务中分类的常用方法。在构造决策树的过程中,节点划分属性选择的度量直接影响决策树分类的效果。基于粗糙集的属性频率函数方法度量属性重要性,并用于分枝划分属性的选择和决策树的预剪枝,提出一种决策树学习算法。同时,为了能处理数值型属性,利用数据集的统计性质为启发式知识,提出了一种改进的数值型属性信息熵离散化算法。实验结果表明,新的离散化方法计算效率有明显提高,新的决策树算法与基于信息熵的决策树算法相比较,结构简单,且能有效提高分类效果。