计算机工程与应用 ›› 2010, Vol. 46 ›› Issue (13): 115-117.DOI: 10.3778/j.issn.1002-8331.2010.13.034

• 数据库、信号与信息处理 • 上一篇    下一篇

基于级联结构的不平衡数据集分类研究

王晓芹1,张化祥1,柴 青2   

  1. 1.山东师范大学 信息科学与工程学院,济南 250014
    2.山东得安信息技术有限公司,济南 250014
  • 收稿日期:2008-12-15 修回日期:2009-02-24 出版日期:2010-05-01 发布日期:2010-05-01
  • 通讯作者: 王晓芹

Study of imbalance dataset classification based on cascade structure

WANG Xiao-qin1,ZHANG Hua-xiang1,CHAI Qing2   

  1. 1.College of Information Science and Engineering,Shandong Normal University,Jinan 250014,China
    2.Shandong Dean Information Technology Co.,LTD,Jinan 250014,China
  • Received:2008-12-15 Revised:2009-02-24 Online:2010-05-01 Published:2010-05-01
  • Contact: WANG Xiao-qin

摘要: 受级联结构的启示,提出了一种针对不平衡数据集分类的新方法,基于级联结构的Bagging分类方法。该方法通过在每一级剔除一部分多数类样本的方式使数据集逐步趋于平衡,并应用欠取样技术得到训练集,用Bagging算法训练分类器,最后把每一级训练到的分类器集成为一个新的分类器。在10个UCI数据集上的实验结果表明,该方法在查全率和F-value值上优于Bagging和AdaBoost。

关键词: 不平衡数据集, 级联结构, Bagging算法, 接受者操作特性(ROC)曲线

Abstract: Borrowing the idea of cascade structure,this paper proposes a new method to classify imbalance dataset,Bagging classification on the basis of cascade structure.This method eliminates a part of majority class at each cascade node,which can make the dataset go balance step by step,obtains traindata using under-sampling technique,and trains classification by Bagging.Then every classification is ensembled gained at each cascade node to a new classification.Experiments on 10 UCI datasets show that this method excels Bagging and AdaBoost on both recall and F-value.

Key words: imbalance dataset, cascade structure, Bagging, Receiver Operating Characteristic(ROC)

中图分类号: