Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (26): 135-137.DOI: 10.3778/j.issn.1002-8331.2009.26.039

• 数据库、信息处理 • Previous Articles     Next Articles

C4.5Bagging algorithm for Chinese text categorization

ZHANG Xiang1,2,ZHOU Ming-quan1,3,GENG Guo-hua1,HUO Fan1   

  1. 1.Visualization Technology Institute,Northwest University,Xi’an 710127,China
    2.College of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055,China
    3.College of Information Science and Technology,Beijing Normal University,Beijing 100875,China
  • Received:2008-05-20 Revised:2008-08-01 Online:2009-09-11 Published:2009-09-11
  • Contact: ZHANG Xiang

面向中文文本分类的C4.5Bagging算法研究

张 翔1,2,周明全1,3,耿国华1,侯 凡1   

  1. 1.西北大学 可视化技术研究所,西安 710127
    2.西安建筑科技大学 信息与控制工程学院,西安 710055
    3.北京师范大学 信息科学与技术学院,北京 100875
  • 通讯作者: 张 翔

Abstract: Aiming at the problem of Chinese text classification,a new method of Bagging is developed.The decision tree C4.5 is selected as the weak classifier and multiple training sets are gained through re-sampling instance.Then,the outputs are combined by voting and the final classification results are obtained.The experimental results show that the classifier based on the C4.5Bagging gets higher precision,recall,F-measure and better performance than C4.5,kNN and Naive-Bayse.

Key words: Bagging, C4.5, Chinese text categorization

摘要: 对于中文文本分类问题,提出一种新的Bagging方法。这一方法以决策树C4.5算法为弱分类器,通过实例重取样获取多个训练集,将其结果按照投票规则进行合成,最终得到分类结果。实验证明,这种算法的准确率、查全率、F1值比C4.5、kNN和朴素贝叶斯分类器都高,具有更加优良的性能。

关键词: Bagging算法, C4.5算法, 中文文本分类

CLC Number: