Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (17): 44-50.DOI: 10.3778/j.issn.1002-8331.1805-0332

Previous Articles     Next Articles

Improved TSVM Learning Algorithm Under Noise Labeling

HE Li, LIU Ying, HAN Keping   

  1. School of Science and Technology, Tianjin University of Finance & Economics, Tianjin 300222, China
  • Online:2019-09-01 Published:2019-08-30

噪声标注下的改进TSVM学习算法

何丽,刘颖,韩克平   

  1. 天津财经大学 理工学院,天津 300222

Abstract: With the rapid development of deep learning, a large amount of labeled data is required. But the original data often has an unknown proportion of noise labels, which will directly affect the final result of the classifier. To deal with the problem of the existence of error labels in datasets, this paper proposes an improved TSVM algorithm adapted to noise labels data. This method uses clustering to filter clusters with higher error rate, and then exchanges the two clusters with higher error rate to reduce the transfer and accumulation of noise labels in the TSVM algorithm. The method can improve the accuracy effectively and enhance the robustness of the TSVM classifier in the data set with different proportions of noise. In order to verify the effectiveness of the proposed algorithm, experiments are performed by adding different proportions of noise tags to the selected UCI data set. Experimental results show that the robustness of proposed algorithm is better than SVM and TSVM in the datasets with different noise ratios.

Key words: noisy label, transductive support vector machines, clustering algorithm, robustness

摘要: 深度学习的迅速发展需要大量有标记数据的支持,而实际数据中往往带有未知比例的噪声标记,会直接影响分类器的最终结果。针对数据集中错误标记的存在,提出了一种噪声标注下的TSVM改进算法,该方法利用聚类筛选出错分率较高的簇,通过交换错分率较高的两个簇的标签,减少TSVM算法中噪声标记的传递和累加,能够有效地提高标记准确率,增强TSVM分类器对不同比例噪声的鲁棒性。为了验证提出算法的有效性,通过在选取的UCI数据集上加入不同比例的噪声标签对算法进行了实验。实验结果表明,该算法在含有不同噪声标记比例的数据集上的鲁棒性均优于SVM和TSVM算法。

关键词: 噪声标记, 直推式支持向量机, 聚类算法, 鲁棒性