Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (5): 166-169.

• 网络、通信与安全 • Previous Articles     Next Articles

Research on classification of imbalanced data based on Sparse Least Squares Support Vector Machines

YAO Quan-zhu1,TIAN Yuan1,WANG Ji2,YANG Zeng-hui1,ZHANG Nan1   

  1. 1.School of Computer Science and Engineering,Xi’an University of Technology,Xi’an 710048,China
    2.College of Computer Science,Northwestern Ploytechnical University,Xi’an 710072,China
  • Received:2007-06-08 Revised:2007-09-28 Online:2008-02-11 Published:2008-02-11
  • Contact: YAO Quan-zhu

基于最小二乘支持向量机的非平衡分布数据分类

姚全珠1,田 元1,王 季2,杨增辉1,张 楠1

  

  1. 1.西安理工大学 计算机科学与工程学院,西安 710048
    2.西北工业大学 计算机学院,西安 710072
  • 通讯作者: 姚全珠

Abstract: Support Vector Machine is a quite efficient classification technique developed on statistical learning theory.However,when the two-class problem samples are very unbalanced,SVM has a poor performance.To significantly improve the classification performance of imbalanced datasets,the nature characteristics of Sparse Least Squares SVM is analyzed and a kind of algorithm for the unbalanced samples is proposed in this paper.The experiments on the UCI database are done with this algorithm.Experimental results indicate that this method significantly improves the classification accuracy of SVM for the unbalanced samples.The speed of classification is much faster than that of conventional SVM in the condition that the correct rate does not decline,especially in the case of large number of support vectors.

Key words: Support Vector Machine, unbalanced data classification, machine learning

摘要: 支持向量机是在统计学习理论基础上发展起来的一种十分有效的分类方法。然而当两类样本数量相差悬殊时,会引起支持向量机分类能力的下降。为了提高支持向量机的非平衡数据分类能力,文章分析了最小二乘支持向量机的本质特征,提出了一种非平衡数据分类算法。在UCI标准数据集上进行的实验表明,该算法能够有效提高支持向量机对非均衡分布数据的正确性,尤其对于大规模训练集的情况,该算法在保证不损失训练精度的前提下,使训练速度有较大提高。

关键词: 支持向量机, 不均衡数据分类, 机器学习