计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (24): 97-99.

• 数据库、信号与信息处理 • 上一篇    下一篇

处理非平衡数据的粒度SVM学习方法

徐 乾1,王文剑2,张文浩1   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-08-21 发布日期:2011-08-21

Granular support vector machine approach used for imbalanced data

XU Qian1,WANG Wenjian2,ZHANG Wenhao1   

  1. 1.School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2.Key Lab of Computational Intelligence & Chinese Information Processing of MoE,Shanxi University,Taiyuan 030006,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-08-21 Published:2011-08-21

摘要: 通过多维关联规则挖掘,将粒度计算(Granular Computing,GrC)和支持向量机(Support Vector Machine,SVM)有效融合,提出一种粒度支持向量机(Granular SVM,GSVM)学习方法,称为AR-GSVM。该方法用于非平衡数据处理时,不仅可以有效降低分类器的复杂性,而且本质上可以进行并行计算以提高学习效率,同时提高分类器的泛化能力。考虑到保持数据在原始空间和特征空间的分布一致性,在AR-GSVM的基础上又提出核空间上的粒度支持向量机学习方法,称为AR-KGSVM,该方法具有更好的泛化性能。通过在UCI数据集上的实验表明:AR-GSVM和AR-KGSVM的泛化能力优于一些常用非平衡数据处理的方法。

关键词: 支持向量机, 粒度计算, 粒度支持向量机, 关联规则, 非平衡数据

Abstract: Through the mining of multi-dimension association rules,Granular Computing(GrC) and Support Vector Machine(SVM) are efficiently amalgamated,and a Granular Support Vector Machine(GSVM) learning approach is proposed,namely AR-GSVM.For imbalanced datasets,AR-GSVM can not only reduce the complexity of the classifier,but also improve learning efficiency and generalization performance.Considering the data distribution consistence in the input space and kennel space,another granular SVM model on kennel space based on AR-GSVM is proposed,which is named as AR-KGSVM.AR-KGSVM can obtain better generalization performance comparing with AR-GSVM.The experimental results on UCI datasets demonstrate that the generalization performances of AR-GSVM and AR-KGSVM are superior to some most common used methods in dealing with imbalanced datasets.

Key words: support vector machine, granular computing, granular support vector machine, association rules, imbalanced data