Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (35): 126-128.DOI: 10.3778/j.issn.1002-8331.2009.35.038

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Naive Bayes based criminal text classification of unbalanced classes

CHENG Chun-hui,HE Qin-ming   

  1. College of Computer Science,Zhejiang University,Hangzhou 310027,China
  • Received:2009-05-27 Revised:2009-07-03 Online:2009-12-11 Published:2009-12-11
  • Contact: CHENG Chun-hui

面向不均衡类别朴素贝叶斯犯罪案件文本分类

程春惠,何钦铭   

  1. 浙江大学 计算机科学与技术学院,杭州 310027
  • 通讯作者: 程春惠

Abstract: According to the feature of case text,this paper explores the special text preprocessing method and compares two effective feature selection methods.An improved model based on multi-variate Bernoulli model is proposed,due to the unbalanced distribution of criminal case categories.The experiment indicates that the improved Naive Bayes method performs better in the case text classification.

摘要: 针对案件文本的特点,提出了具有针对性的特殊文本预处理方法,并比较了两种有效的特征选择方法。针对案件类别分布不均衡的特点,提出了改进的多变量贝努里模型。实验结果表明,改进的多变量贝努里模型有效地提高了案件文本分类的准确率。

CLC Number: