计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (7): 20-22.

• 博士论坛 • 上一篇    下一篇

基于样本差异度的SVM训练样本缩减算法

陈圣兵1,王晓峰1,2   

  1. 1.合肥学院 计算机科学与技术系 网络与智能信息处理重点实验室,合肥 230601
    2.中国科学院 合肥智能机械研究所 智能计算实验室,合肥 230031
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2012-03-01 发布日期:2012-03-01

Algorithm for reduction SVM training sample based on sample dissimilarity

CHEN Shengbing1, WANG Xiaofeng1,2   

  1. 1.Key Lab of Network and Intelligent Information Processing, Department of Computer Science and Technology, Hefei University, Hefei 230601, China
    2.Intelligent Computing Lab, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-03-01 Published:2012-03-01

摘要: 为了对大规模训练样本进行缩减,提出了k近邻向量,给出了一种新的样本差异度的计量方法,证明了该差异度关于噪声识别和类边界距离的几个性质。依据此性质提出了一个高效的SVM训练样本缩减算法,算法首先根据样本差异度的性质剔除噪声样本,然后用类间差异度近似表示类边界距离,结合样本相似性,直接从原始样本空间剔除次要的训练样本。仿真结果表明,减样算法可以有效缩减样本,提高训练效率。

Abstract: To reduce large-scale training sample set, the concept of k-nearest vectors is proposed, and a new account method for dissimilarity is given accordingly. Then, the paper proposes and proves the methods of noise identification and boundaries distance description. Based on these methods, an efficient sample reduction algorithm is proposed. The algorithm removes noise samples according to the dissimilarity at first step, then according to the similarity of samples, and the dissimilarity which describes the distance between sample and classification boundary, the algorithm removes minor training samples from the original sample space directly. Experiments indicate that the reduction algorithm can effectively reduce the sample, and improve the training efficiency.