计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (3): 124-128.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

一种基于支持向量阈值控制的优化增量SVM算法

刘  伟,谢兴生,肖超峰   

  1. 中国科学技术大学 自动化系,合肥 230027
  • 出版日期:2015-02-01 发布日期:2015-01-28

Optimized incremental SVM algorithm based on support vector threshold control

LIU Wei, XIE Xingsheng, XIAO Chaofeng   

  1. Department of Automation, University of Science and Technology of China, Hefei 230027, China
  • Online:2015-02-01 Published:2015-01-28

摘要: 针对I-SVM算法在文本分类中训练时间较长和分类效率低的问题,提出了一种基于支持向量(SV)阀值控制的优化I-SVM算法(TI-SVM)。由于在增量训练样本集中存在大量的非SV,TI-SVM算法根据历史训练模型和KKT条件对新增样本集和历史样本集进行预处理,剔除大部分的非SV,根据预处理后的样本集进行训练新的SVM模型,利用文本的相似度和预设SV的阀值对模型中的冗余SV进一步处理,以提高分类性能。经过对一组客户新闻分类的实验表明,该算法在保证分类精度的同时有效提高了模型的训练和分类效率。

关键词: 支持向量机, 机器学习, 文本分类, 分类模型, KKT条件

Abstract: With information constantly updating and sample collecting, the classification performance and accuracy of initial training model using I-SVM is of low efficiency and costs long time. To solve this problem, this paper describes a growing Incremental Supported Vector Machine algorithm(I-SVM) based on support vector threshold control optimization. The TI-SVM algorithm removes most of the non-support vector which aims at new sample sets and the historical sample set that are based on historical training model and the KKT conditions pretreatment. According to the sample after the pretreatment set, this algorithm trains a new SVM model. It takes vantage of the similarity of the text and the default threshold of support vector system to give a further treatment to the redundancy of support vector and to improve the classification performance. The theoretical analysis and experimental results show that the algorithm is effective with a high classification accuracy.

Key words: Support Vector Machine(SVM), machine learning, text classification, model of classification, Karush-Kuhn-Tucker(KKT)