基于近邻密度和半监督KNN的集成自训练方法

doi:10.3778/j.issn.1002-8331.1706-0340

计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (20): 132-138.DOI: 10.3778/j.issn.1002-8331.1706-0340

基于近邻密度和半监督KNN的集成自训练方法

黎隽男，吕佳

重庆师范大学计算机与信息科学学院，重庆 401331

出版日期:2018-10-15 发布日期:2018-10-19

Integrated self-training method based on neighborhood density and semi-supervised KNN

LI Junnan, LV Jia

College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China

Online:2018-10-15 Published:2018-10-19

摘要/Abstract

摘要： 针对集成自训练算法随机初始化有标记样本容易在迭代中局部过拟合，不能很好地泛化到样本原始空间结构和集成自训练算法用WKNN分类器做数据剪辑时没有考虑到无标记样本对待测样本类别判定有影响的问题，提出结合近邻密度和半监督KNN的集成自训练算法。该算法用近邻密度方法选取初始化的已标注样本，避免已标注样本周围[k]个近邻样本成为已标注候选集。这样使初始化的已标注样本间的距离尽量分散，以更好地反应样本原始空间结构。同时在已标注样本候选集中选取密度最大的样本作为已标注样本。为了提高数据剪辑的性能，用半监督KNN代替WKNN，弥补WKNN做数据剪辑的时候只考虑到了有标记样本对待测样本类别的影响，而没有利用待测样本周围的无标记样本的问题，在UCI数据集上的对比实验验证了提出算法的有效性。

关键词: 集成自训练, 近邻密度, 半监督, K近邻（KNN）

Abstract: Integrated self-training algorithm is apt to locally overfit during iteration when it is used to randomly initialize labeled samples, which leads to poor generalization to the original sample space structure. Additionally, integrated self-training algorithm with WKNN classifier, which is adopted to edit data, doesn’t take into account the unlabeled samples’ effect on class labels of test samples. Thus, an integrated self-training algorithm based on nearest neighbor density and semi-supervised KNN is proposed in this paper. The algorithm uses the nearest neighbor density to select the initially labeled samples to avoid choosing K nearest neighbor samples around labeled samples as labeled sample candidates, so the distribution of the selective samples will be decentralized and it can better reflect the sample space structure. At the same time, in order to improve the performance of data clips, semi-supervised KNN is used into the algorithm instead of WKNN. It chooses the unlabeled samples with the highest density as the labeled samples so that it can make full use of unlabeled samples. The effectiveness of the presented algorithm is verified by comparative experiments on UCI datasets.

Key words: integrated self-training, nearest neighbor density, semi-supervised, K Nearest Neighbor（KNN）

黎隽男，吕佳. 基于近邻密度和半监督KNN的集成自训练方法[J]. 计算机工程与应用, 2018, 54(20): 132-138.

LI Junnan, LV Jia. Integrated self-training method based on neighborhood density and semi-supervised KNN[J]. Computer Engineering and Applications, 2018, 54(20): 132-138.

[1]	邹承明，胡佑璞. 引入生成对抗网络的室外场景单目深度估计[J]. 计算机工程与应用, 2021, 57(6): 176-183.
[2]	周绍光，吴昊，赵婵娟，陈仁喜. 利用同质区特性的高光谱图像迁移学习分类[J]. 计算机工程与应用, 2021, 57(21): 224-233.
[3]	米源，唐恒亮. 基于图卷积网络的谣言鉴别研究[J]. 计算机工程与应用, 2021, 57(13): 161-167.
[4]	唐焕玲，刘艳红，郑涵，窦全胜，鲁明羽. 融合SLDA主题模型的不均衡文本分类方法[J]. 计算机工程与应用, 2021, 57(12): 144-154.
[5]	但雨芳，陶剑文，徐浩特. 可能性聚类假设的半监督分类方法[J]. 计算机工程与应用, 2020, 56(9): 65-74.
[6]	宋丽丽，李彬，赵俊雅，刘国峰. 正态重采样的改进行人再识别度量学习算法[J]. 计算机工程与应用, 2020, 56(8): 158-165.
[7]	韩嵩，韩秋弘. 半监督学习研究的述评[J]. 计算机工程与应用, 2020, 56(6): 19-27.
[8]	杨静雅，孙林夫，吴奇石. 基于半监督谱聚类集成的售后客户细分[J]. 计算机工程与应用, 2020, 56(2): 266-271.
[9]	杨烁，刘兵，周勇. 基于稀疏编码的半监督低秩核学习算法[J]. 计算机工程与应用, 2019, 55(7): 175-181.
[10]	龚彦鹭，吕佳. 结合半监督聚类和加权KNN的协同训练方法[J]. 计算机工程与应用, 2019, 55(22): 114-118.
[11]	黄冬梅，张晓桐，张明华，宋巍. 全局判别与局部稀疏保持HSI半监督特征提取[J]. 计算机工程与应用, 2019, 55(20): 184-191.
[12]	王小玉1，丁世飞1，2. 基于共享近邻的成对约束谱聚类算法[J]. 计算机工程与应用, 2019, 55(2): 142-147.
[13]	刘丽丽，周绍光，赵婵娟，丁倩. 基于伪标签深度学习的高光谱影像半监督分类[J]. 计算机工程与应用, 2019, 55(17): 191-198.
[14]	张璞1，柴变芳1，张静1，李文斌2. 半监督属性网络表示学习方法[J]. 计算机工程与应用, 2019, 55(12): 117-123.
[15]	王玉业，陈健美. 安全的半监督方法的协同过滤推荐算法[J]. 计算机工程与应用, 2018, 54(8): 107-111.

基于近邻密度和半监督KNN的集成自训练方法

Integrated self-training method based on neighborhood density and semi-supervised KNN

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics