计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (12): 147-148.DOI: 10.3778/j.issn.1002-8331.2009.12.048

• 数据库、信号与信息处理 • 上一篇    下一篇

贝叶斯文本分类器的研究与改进

史瑞芳   

  1. 山西经济管理干部学院,太原 030024
  • 收稿日期:2008-09-25 修回日期:2009-01-08 出版日期:2009-04-21 发布日期:2009-04-21
  • 通讯作者: 史瑞芳

Research and improvement on Naive Bayes text classifier

SHI Rui-fang   

  1. Shanxi Economic Management Institute,Taiyuan 030024,China
  • Received:2008-09-25 Revised:2009-01-08 Online:2009-04-21 Published:2009-04-21
  • Contact: SHI Rui-fang

摘要: 朴素贝叶斯文本分类是目前公认的一种简单有效的概率分类方法,但该方法的数据稀疏问题以及所采用的Laplace平滑方法还不是最优,存在一定的缺陷。因此,用一元统计语言模型的平滑方法来改进数据稀疏状况,提高了分类效果。

关键词: 贝叶斯文本分类, 数据稀疏, 平滑

Abstract: Naïve Bayes method is a simple and effective established probability categorization method at present.However,the problems on scattered data in methodology and Laplace smoothness method have some disadvantages.Therefore,author proposes to use uni-gram smoothness method to improve the condition and the effect on categorization by Bayes method.

Key words: Bayes text categorization, scattered data, smoothness