计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (22): 53-55.DOI: 10.3778/j.issn.1002-8331.2009.22.018

• 研发、设计、测试 • 上一篇    下一篇



  1. 东北大学 东软信息技术学院 计算机科学与技术系,辽宁 大连 100623
  • 收稿日期:2008-06-18 修回日期:2008-09-18 出版日期:2009-08-01 发布日期:2009-08-01
  • 通讯作者: 陈艳秋

Design and implementation of new Chinese text classier

CHEN Yan-qiu,XIONG Yao-hua   

  1. Department of Computer Science,Neusoft Institute of Information,Dalian,Liaoning 100623,China
  • Received:2008-06-18 Revised:2008-09-18 Online:2009-08-01 Published:2009-08-01
  • Contact: CHEN Yan-qiu

摘要: 为了提高中文文本分类的效率与精度,设计了一种新型的分类器。该分类器采用基于词频、互信息和类别信息的综合评估函数进行选择特征;在特征权重计算上,由于传统TF-IDF方法没有考虑特征类间和类内分布,提出了一种将词频和综合评估函数值相结合的权重计算方法;最后设计了一种基于贝叶斯原理的快速分类器。实验证明该分类器简单有效。

关键词: 中文文本分类, 特征选择, 特征权重, 分类算法

Abstract: For improving the efficiency and accuracy of Chinese text categorization,this paper presents a new Chinese text classier,in which a novel feature selection is proposed according to word frequency,mutual information and classificatory information,and after analyzing the hypostasis of the traditional TF-IDF,a weight adjustment method is put forward in which the IDF function is replaced by function used in feature selection.Finally a fast Bayes theory classier is designed.Experiments prove this classier is simple and effective.

Key words: Chinese text categorization, feature selection, feature weighting, classification algorithm