Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (22): 103-106.DOI: 10.3778/j.issn.1002-8331.2009.22.034

• 数据库、信息处理 • Previous Articles     Next Articles

Semi-supervised SVM based on Tri-training

LI Kun-lun,ZHANG Wei,DAI Yun-na   

  1. College of Electronic and Information Engineering,Hebei University,Baoding,Hebei 071002,China
  • Received:2008-06-24 Revised:2008-09-16 Online:2009-08-01 Published:2009-08-01
  • Contact: LI Kun-lun

基于Tri-training的半监督SVM

李昆仑,张 伟,代运娜   

  1. 河北大学 电子信息工程学院,河北 保定 071002
  • 通讯作者: 李昆仑

Abstract: One of the main difficulties in machine learning is how to solve large-scale problem effectively,and the labeled data are limited and fairly expensive to obtain.In this paper a new semi-supervised SVM is proposed.It applies Tri-training to improve SVM.The semi-supervised SVM uses a few labeled data to train few initial SVM classifiers and makes use of the large number unlabeled data to modify the classifier iteratively.Experiments on UCI dataset show that Tri-training can improve the classification accuracy of SVM and can increase the difference of classifier,the accuracy of final classifier will be higher.Although Tri-training doesn’t put any constraints on the supervised learning algorithm,the proposed method uses the SVMs with three different kernel functions as the supervised learning algorithm.The different kernel can increase the difference of the three SVMs,so the performance of co-training will be better.Theoretical analysis and experiments show that the proposed algorithm has excellent accuracy and speed of classification.

Key words: semi-supervised learning, co-training, Tri-training, Support Vector Machine(SVM), least square support vector machine

摘要: 当前机器学习面临的主要问题之一是如何有效地处理海量数据,而标记训练数据是十分有限且不易获得的。提出了一种新的半监督SVM算法,该算法在对SVM训练中,只要求少量的标记数据,并能利用大量的未标记数据对分类器反复的修正。在实验中发现,Tri-training的应用确实能够提高SVM算法的分类精度,并且通过增大分类器间的差异性能够获得更好的分类效果,所以Tri-training对分类器的要求十分宽松,通过SVM的不同核函数来体现分类器之间的差异性,进一步改善了协同训练的性能。理论分析与实验表明,该算法具有较好的学习效果。

关键词: 半监督学习, 协同训练, Tri-training, 支持向量机, 最小二乘支持向量机