Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (22): 114-118.DOI: 10.3778/j.issn.1002-8331.1807-0159

Previous Articles     Next Articles

Co-Training Method Combined with Semi-Supervised Clustering and Weighted [K]-Nearest Neighbor

GONG Yanlu, LV Jia   

  1. College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
  • Online:2019-11-15 Published:2019-11-13



  1. 重庆师范大学 计算机与信息科学学院,重庆 401331

Abstract: In the process of co-training iteration, the lack of useful information implied by the selection of unmarked samples and the inconsistency of multiple classifier markers will lead to the unmarked samples of error marks. Aiming at the above questions, this paper proposes a co-training method combined with a semi-supervised clustering and the weighted [K]-nearest neighbor. In the process of each iteration, the method first carries out a semi-supervised clustering on the training set, chooses the unmarked samples with high membership degree to the naive Bayes classification, and then uses the weighted [K]-nearest neighbor algorithm to reclassify the inconsistent unmarked samples classified by multiple classifier. Using a semi-supervised clustering can choose the better performance data of the space structure of samples, and using the weighted [K]-nearest neighbor algorithm to mark the inconsistent unmarked samples can solve the problem of classification accuracy degradation caused by inconsistent marking. The comparison experiment on UCI dataset verifies the validity of the algorithm.

Key words: co-training, semi-supervised clustering, weighted [K]-nearest neighbor, view

摘要: 针对协同训练方法在迭代时选择加入的无标记样本所隐含的有用信息不够,以及协同训练方法多个分类器标记不一致带来错误标记无标记样本的问题,提出了一种结合半监督聚类和加权[K]最近邻的协同训练方法。该方法在每次迭代过程中,先对训练集进行半监督聚类,选择隶属度高的无标记样本给朴素贝叶斯分类,再用加权[K]最近邻算法对多个分类器分类不一致的无标记样本重新分类。利用半监督聚类能够选择出较好表现数据空间结构的样本,而采用加权[K]最近邻算法为标记不一致的无标记样本重新标记能够解决标记不一致带来的分类精度降低问题。在UCI数据集上的对比实验验证了该算法的有效性。

关键词: 协同训练, 半监督聚类, 加权[K]最近邻, 视图