Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (18): 45-47.DOI: 10.3778/j.issn.1002-8331.2009.18.014

• 研究、探讨 • Previous Articles     Next Articles

Induction of decision tree based on rough sets technique

ZHAI Jun-hai1,WANG Xi-zhao1,ZHANG Cang-sheng2   

  1. 1.Key Lab of Machine Learning and Computational Intelligence,College of Mathematics and Computer Science,Hebei University,Baoding,Hebei 071002,China
    2.Computing Center,Hebei University,Baoding,Hebei 071002,China
  • Received:2008-12-15 Revised:2009-02-18 Online:2009-06-21 Published:2009-06-21
  • Contact: ZHAI Jun-hai

基于粗糙集技术的决策树归纳

翟俊海1,王熙照1,张沧生2   

  1. 1.河北大学 数学与计算机学院 河北省机器学习与计算智能重点实验室,河北 保定 071002
    2.河北大学 计算中心,河北 保定 071002
  • 通讯作者: 翟俊海

Abstract: The ID3 algorithm is a typical decision tree induction method.Information gain measure is utilized to select optimal attributes with minimum entropy.Decision tree is recursively generated.However,there is natural bias in the information gain measure that favors attributes with many values over those with few values.Moreover,it assumes that the distribution of all classes’ instances in the training set is same with the real problems.This paper presents a novel decision tree induction method,which is purely driven by the data used,and can overcome the drawbacks mentioned above.

Key words: decision tree, ID3 algorithm, rough sets, upper approximations, lower approximations

摘要: ID3算法是一种典型的决策树归纳算法,它以信息增益作为选择扩展属性根结点的标准,并递归地生成决策树。但ID3算法倾向于选取属性取值较多的属性作为根结点,而且它假设训练集中各类别样例的比例应与实际问题领域里各类别样例的比例相同。提出一种新的基于粗糙集技术的决策树归纳算法,它是一种完全数据驱动的归纳算法,可以克服ID3算法的上述不足。

关键词: 决策树, ID3算法, 粗糙集, 上近似, 下近似