半监督学习研究的述评

doi:10.3778/j.issn.1002-8331.1911-0083

摘要/Abstract

摘要：

监督学习需要利用大量的标记样本训练模型，但实际应用中，标记样本的采集费时费力。无监督学习不使用先验信息，但模型准确性难以保证。半监督学习突破了传统方法只考虑一种样本类型的局限，能够挖掘大量无标签数据隐藏的信息，辅助少量的标记样本进行训练，成为机器学习的研究热点。通过对半监督学习研究的总趋势以及具体研究内容进行详细的梳理与总结，分别从半监督聚类、分类、回归与降维以及非平衡数据分类和减少噪声数据共六个方面进行综述，发现半监督方法众多，但存在以下不足：（1）部分新提出的方法虽然有效，但仅通过特定数据集进行了实证，缺少一定的理论证明；（2）复杂数据下构建的半监督模型参数较多，结果不稳定且缺乏参数选取的指导经验；（3）监督信息多采用样本标签或成对约束形式，对混合约束的半监督学习需要进一步研究；（4）对半监督回归的研究匮乏，对如何利用连续变量的监督信息研究甚少。

关键词: 半监督学习, 半监督聚类, 半监督分类, 半监督降维, 半监督回归

Abstract:

Traditional supervised learning methods require a lot of labeled samples to accomplish training tasks, but in practical application, the collection of labeled samples is difficult. Although unsupervised learning methods do not require prior information, it is difficult to guarantee the accuracy. Semi-supervised learning breaks through the limitation of traditional methods that only consider labeled samples or unlabeled samples, and can mine a large amount of information hidden in unlabeled data and assist a small number of labeled samples for training, becoming a research hotspot of machine learning. This paper summarizes the general trend and detailed research contents of domestic semi-supervised learning, and summarizes six aspects including semi-supervised clustering, classification, regression and dimension reduction, unbalanced data classification and noise reduction. There are many semi-supervision methods, but there are some shortcomings：（1）Although some of the new and effective methods are proposed, they are only verified through specific data sets and lack theoretical basis and proof. （2）When the data is complex, the semi-supervised model needs many parameters, but lacks the experience of parameter selection. （3）The supervision information is mostly in the form of sample labels or pair constraints, and the semi-supervised learning of mixed constraints needs further study. （4）There is a lack of research on semi-supervised regression and the prior information of continuous variables.

Key words: semi-supervised learning, semi-supervised clustering, semi-supervised classification, semi-supervised dimension reduction, semi-supervised regression

韩嵩，韩秋弘. 半监督学习研究的述评[J]. 计算机工程与应用, 2020, 56(6): 19-27.

HAN Song, HAN Qiuhong. Review of Semi-Supervised Learning Research[J]. Computer Engineering and Applications, 2020, 56(6): 19-27.

[1]	邹承明，胡佑璞. 引入生成对抗网络的室外场景单目深度估计[J]. 计算机工程与应用, 2021, 57(6): 176-183.
[2]	米源，唐恒亮. 基于图卷积网络的谣言鉴别研究[J]. 计算机工程与应用, 2021, 57(13): 161-167.
[3]	唐焕玲，刘艳红，郑涵，窦全胜，鲁明羽. 融合SLDA主题模型的不均衡文本分类方法[J]. 计算机工程与应用, 2021, 57(12): 144-154.
[4]	但雨芳，陶剑文，徐浩特. 可能性聚类假设的半监督分类方法[J]. 计算机工程与应用, 2020, 56(9): 65-74.
[5]	宋丽丽，李彬，赵俊雅，刘国峰. 正态重采样的改进行人再识别度量学习算法[J]. 计算机工程与应用, 2020, 56(8): 158-165.
[6]	杨烁，刘兵，周勇. 基于稀疏编码的半监督低秩核学习算法[J]. 计算机工程与应用, 2019, 55(7): 175-181.
[7]	龚彦鹭，吕佳. 结合半监督聚类和加权KNN的协同训练方法[J]. 计算机工程与应用, 2019, 55(22): 114-118.
[8]	王小玉1，丁世飞1，2. 基于共享近邻的成对约束谱聚类算法[J]. 计算机工程与应用, 2019, 55(2): 142-147.
[9]	刘丽丽，周绍光，赵婵娟，丁倩. 基于伪标签深度学习的高光谱影像半监督分类[J]. 计算机工程与应用, 2019, 55(17): 191-198.
[10]	张璞1，柴变芳1，张静1，李文斌2. 半监督属性网络表示学习方法[J]. 计算机工程与应用, 2019, 55(12): 117-123.
[11]	王玉业，陈健美. 安全的半监督方法的协同过滤推荐算法[J]. 计算机工程与应用, 2018, 54(8): 107-111.
[12]	吴明胜，邓晓刚. 基于Tri-DE-ELM的半监督模式分类方法研究[J]. 计算机工程与应用, 2018, 54(3): 109-114.
[13]	卢月明1，王亮1，仇阿根1，张用川1，2，赵阳阳1. 基于半监督学习的克里金插值方法[J]. 计算机工程与应用, 2018, 54(22): 265-270.
[14]	陈玉琦1，雷刚1，姚明海2，易玉根1. 基于局部约束的自适应图标签传递方法[J]. 计算机工程与应用, 2018, 54(20): 14-19.
[15]	贾伟1，2，华庆一1，张敏军1，陈锐1，姬翔1，王博1，3. 改进极限学习机的移动界面模式半监督分类[J]. 计算机工程与应用, 2018, 54(2): 11-19.

半监督学习研究的述评

Review of Semi-Supervised Learning Research

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics