Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (29): 119-123.

Previous Articles     Next Articles

Ensemble learning model for imbalanced data classification

JIAO Shenglan, YANG Bingru, ZHAI Yun, ZHAO Wanli   

  1. School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
  • Online:2012-10-11 Published:2012-10-22

一种用于非平衡数据分类的集成学习模型

焦盛岚,杨炳儒,翟  云,赵万里   

  1. 北京科技大学 计算机与通信工程学院,北京 100083

Abstract: For the issue of classification on imbalanced datasets, this paper presents an improved SVM-KNN classification algorithm. On this basis, an ensemble learning model is proposed. This model employs limited sampling to segment the majority class samples, re-combines the subset of majority class samples with the minority class samples, obtains several basic classifiers by training the combined subset based on improved SVM-KNN. These basic classifiers are integrated. Experimental results on UCI dataset show that this ensemble learning model has satisfactory performance when dealing with issue of classification on imbalanced datasets.

Key words: imbalanced data, ensemble learning model, basic classifier, improved Support Vector Machine-K Nearest Neighbor(SVM-KNN), UCI dataset

摘要: 针对非平衡数据分类问题,提出了一种改进的SVM-KNN分类算法,在此基础上设计了一种集成学习模型。该模型采用限数采样方法对多数类样本进行分割,将分割后的多数类子簇与少数类样本重新组合,利用改进的SVM-KNN分别训练,得到多个基本分类器,对各个基本分类器进行组合。采用该模型对UCI数据集进行实验,结果显示该模型对于非平衡数据分类有较好的效果。

关键词: 非平衡数据, 集成学习模型, 基本分类器, 改进的支持向量机-K最近邻(SVM-KNN), UCI 数据集