计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (5): 69-72.DOI: 10.3778/j.issn.1002-8331.1507-0161

• 大数据与云计算 • 上一篇    下一篇

用平滑方法改进多关系朴素贝叶斯分类

徐光美1,刘宏哲2,张敬尊1,王金华1   

  1. 1.北京联合大学 信息学院,北京100101
    2.北京联合大学 信息服务工程重点实验室,北京 100101
  • 出版日期:2017-03-01 发布日期:2017-03-03

Improving multi-relational Naive Bayesian classifier using smoothing methods

XU Guangmei1, LIU Hongzhe2, ZHANG Jingzun1, WANG Jinhua1   

  1. 1. College of Information Technology, Beijing Union University, Beijing 100101, China
    2. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
  • Online:2017-03-01 Published:2017-03-03

摘要: 为消除朴素贝叶斯分类时的零概率以及过度拟合问题,分析了各种概率平滑方法,给出了基于M估计的多关系朴素贝叶斯分类方法(MRNBC-M)和基于Laplace估计的多关系朴素贝叶斯分类方法(MRNBC-L),分析探讨了M平滑和Laplace平滑方法对多关系分类的影响情况,为进一步优化分类,方法基于扩展互信息标准对数据进行属性过滤。多关系标准数据集上的实验显示,MRNBC-M可以有效改进分类性能。

关键词: 多关系数据挖掘, 朴素贝叶斯, 参数平滑, 互信息

Abstract:  To eliminate the naive Bayesian classification of zero probability and overfitting problem, this paper discusses the various probability smoothing method, gives MRNBC-M(Multi-Relational Naive Bayesian Classifier based  on M-estimation)and MRNBC-L(Multi-Relational Naive Bayesian Classifier based  on Laplace-estimation). In the case of multi-relationship, the impact of M and Laplace estimation methods on the classification is analyzed. In order to further optimize the classification, the method is based on the extended mutual information criterion. Experiments on the multi-relational datasets show that MRNBC-M can effectively improve the classification performance.

Key words:  Multi-Relational Data Mining(MRDM), Naive Bayes, smoothing methods, mutual information