计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (12): 169-173.DOI: 10.3778/j.issn.1002-8331.1805-0482

• 模式识别与人工智能 • 上一篇    下一篇

决策树C4.5算法的改进与分析

安葳鹏,尚家泽   

  1. 河南理工大学 计算机科学与技术学院,河南 焦作 454000
  • 出版日期:2019-06-15 发布日期:2019-06-13

Improvement and Analysis of C4.5 Decision Tree Algorithm

AN Weipeng, SHANG Jiaze   

  1. College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China
  • Online:2019-06-15 Published:2019-06-13

摘要: C4.5算法在选择分裂属性时只考虑了每个条件属性和决策属性之间的关系,而没有考虑到条件属性间的相关性,直接影响构建树的准确率。提出一种基于Kendall和谐系数的C4.5决策树优化算法,用于解决条件属性之间相关性的问题,提高算法属性选择的准确性。在引入系数的基础上运用等价无穷小原理对计算公式进行简化,提高了算法的效率。对改进后的C4.5算法和传统的算法进行仿真实验,结果表明,改进的C4.5算法在准确度和效率上都有较大提高。

关键词: C4.5算法, Kendall和谐系数, 决策树

Abstract: When choosing splitting attributes, C4.5 algorithm only takes the attribute relationship between every prerequisite and decision instead of correlation among condition attributes, which influences the accuracy of the construction tree directly. An optimized algorithm, C4.5 decision making tree, based on Kendall harmony coefficient, is proposed to solve the correlation among condition attributes and improve the selection accuracy of algorithm attribute. The equivalent infinitesimal is used to simplify the calculation formula on the basis of introducing the coefficient, which improves efficiency of the algorithm. Simulation on the improved C4.5 and the traditional algorithm shows that the former one has more accuracy and efficiency.

Key words: C4.5 algorithm, Kendall’s coefficient of concordance, decision tree