Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (25): 138-140.DOI: 10.3778/j.issn.1002-8331.2009.25.042

• 数据库、信息处理 • Previous Articles     Next Articles

Research on classification algorithms in imbalanced data based on boosting

PAN Jun,LI Hong,LI Bo   

  1. School of Information Science and Engineering,Central South University,Changsha 410083,China
  • Received:2008-05-13 Revised:2008-08-15 Online:2009-09-01 Published:2009-09-01
  • Contact: PAN Jun

基于推进的非平衡数据分类算法研究

潘 俊,李 宏,李 博   

  1. 中南大学 信息科学与工程学院,长沙 410083
  • 通讯作者: 潘 俊

Abstract: The application of data classification in reality usually confronts to a problem named imbalanced data that the amount of one class is larger than another class.At the present time,as one of the solutions to classification of imbalanced data,Boosting has a great prospect because the whole performance of classification can be improved by increasing the minority class’s F-Measure in the form of iteration.This paper will analyze the reason that the performance of imbalanced data is weak,and improve this classical Boosting algorithm by restraining from overfitting and controlling the F-Measure of minority class,and propose an improved algorithm named RIFBoost,and then compare this new algorithm with some traditional classified algorithms on WEKA system.The experiment result indicates that RIFBoost can increase the F-Measure of minority class while maintaining the whole of classification’s precision.

Key words: imbalanced data, boosting algorithm, Waikato Environment for Knowledge Analysis(WEKA) system, F-measure

摘要: 在现实世界的数据分类应用中,通常会遇到数据不平衡的问题,即数据中一类数据的数量要大于另一类数据的数量。在目前针对非平衡数据的分类问题的解决方案中,推进算法因其能通过多次迭代提高少数类的分类指标来提高分类器的整体性能而有着较好的应用前景。从分析非平衡数据分类性能差的原因入手,通过抑制过度拟合与对少数类的F度量的控制对经典推进算法进行改进,提出了一种改进算法RIFBoost,然后将算法在WEKA系统上与几个传统的分类算法进行了比较。实验结果表明,RIFBoost算法在保留整体精度的同时对少数类的F度量的性能有了一定的提高。

关键词: 非平衡数据, 推进算法, 怀卡托智能分析环境(WEKA)系统, F度量

CLC Number: