计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (31): 178-181.

• 数据库与信息处理 • 上一篇    下一篇

基于粗糙集与属性值聚类的决策树改进算法

王春年,梁吉业   

  1. 山西大学 计算机与信息技术学院,太原 030006
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-11-01 发布日期:2007-11-01
  • 通讯作者: 王春年

Algorithm of decision trees based on rough set and clustering attribute’s values

WANG Chun-nian,LIANG Ji-ye   

  1. School of Computer & Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-11-01 Published:2007-11-01
  • Contact: WANG Chun-nian

摘要: 采用粗糙集理论和属性值聚类相结合的方法,从决策树最优化的三个原则对其进行优化。首先,采用粗糙集理论的约简功能求出相对核,并利用信息熵作为启发信息求出相对约简,以此来保证生成决策树的路径最短和减少决策树的节点数。其次,在选择特征属性时,在信息熵增益最大的前提下,根据属性值间的相异性距离来对属性值聚类使其能够接近单峰分布。通过对UCI数据实验分析,结果表明很大程度上减少了决策树的节点数和决策树的深度。

Abstract: The paper puts forward the way which includes the rough set theory and the cluster of attribute’s values by optimizing decision tree from three principles.First,the relative core and relative reduction based on information entropy is worked out by rough set theory,which decreasing the decision tree’s nodes in number and the decision tree’s path in depth.Second,when the characteristic attributes that the information entropy is most gained are selected,by clustering attribute’s values,the curve shows the peak of distribution,or approximately.By analyzing the data of UCI database,the results show that the algorithm greatly decreases decision tree’s nodes in number and the depth of the paths.