计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (7): 137-140.DOI: 10.3778/j.issn.1002-8331.1812-0147

• 模式识别与人工智能 • 上一篇    下一篇

Fleiss’ Kappa系数在贝叶斯决策树算法中的应用

安葳鹏,程小博,刘雨   

  1. 河南理工大学 计算机科学与技术学院,河南 焦作 454000
  • 出版日期:2020-04-01 发布日期:2020-03-28

Application of Fleiss’ Kappa Coefficient in Bayesian Decision Tree Algorithm

AN Weipeng, CHENG Xiaobo, LIU Yu   

  1. College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China
  • Online:2020-04-01 Published:2020-03-28

摘要:

针对决策树C4.5算法处理小规模缺失数据以及二义性数据时不稳定、效率低,以及在分裂节点时条件属性之间关系的问题,提出了一种在决策树C4.5算法与朴素贝叶斯算法结合的基础上,引入Fleiss’ Kappa系数的改进算法,从而解决了C4.5算法在处理小规模缺失数据、二义数据效率低以及条件属性之间相关性的问题。通过理论分析和在标准UCI数据集实验结果表明,该算法在牺牲一定执行效率的基础上,分类精度得到明显的提高。

关键词: C4.5算法, 二义性数据, Fleiss&rsquo, Kappa系数, 朴素贝叶斯算法

Abstract:

The problem of the decision tree C4.5 algorithm contains the instability of dealing with petty missing data and ambiguous data, and the problem of handling the relationship between conditional attributes when splitting nodes as well. Focusing on the problems above, an improved algorithm of Fleiss’ Kappa coefficient is introduced, based on the combination of decision tree C4.5 algorithm and naive Bayes algorithm. It can solve the problem more effectively and precisely which the C4. 5 algorithm cannot deal. According to the theoretical analysis and experimental results based on the standard UCI data sets, the classification accuracy is significantly improved with the price of sacrificing certain execution efficiency.

Key words: C4. 5 algorithm, ambiguous data, Fleiss&rsquo, Kappa coefficient, naive Bayesian algorithm