结合密度峰值和切边权值的自训练算法

doi:10.3778/j.issn.1002-8331.1912-0357

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (2): 70-76.DOI: 10.3778/j.issn.1002-8331.1912-0357

结合密度峰值和切边权值的自训练算法

卫丹妮，杨有龙，仇海全

1.西安电子科技大学数学与统计学院，西安 710071
2.安徽科技学院信息与网络工程学院，安徽蚌埠 233030

出版日期:2021-01-15 发布日期:2021-01-14

Self-Training Algorithm Combining Density Peak and Cut Edge Weight

WEI Danni, YANG Youlong, QIU Haiquan

1.School of Mathematics and Statistics, Xidian University, Xi’an 710071, China
2.College of Information & Network Engineering, Anhui Science and Technology University, Bengbu, Anhui 233030, China

Online:2021-01-15 Published:2021-01-14

摘要/Abstract

摘要：

针对自训练迭代过程中错误标记样本对算法性能的影响，提出了基于密度峰值和切边权值的自训练算法。用密度聚类方法发现数据集的空间结构，选出具有代表性的未标记样本进行标签预测。用切边权值作为统计量进行假设检验，判断样本是否被正确标记，进而用正确标记样本逐步扩充有标记样本集合，直至所有未标记样本标签预测完成。新算法既充分利用了样本数据的空间结构信息，又解决了部分样本被标记错误的问题，提高了算法的分类准确率。通过在真实数据集上实验验证了新算法的有效性。

关键词: 自训练, 密度峰值, 切边权值, 假设检验

Abstract:

In view of the influence of mislabeled samples on the performance of self-training algorithm in the process of iteration, a self-training algorithm based on density peak and cut edge weight is proposed. Firstly, the representative unlabeled samples are selected for labels prediction by space structure, which is discovered by clustering method based on density of data. Secondly, cut edge weight is used as statistics to make hypothesis testing. This technique is for identifying whether samples are labeled correctly. And then the set of labeled data is gradually enlarged until all unlabeled samples are labeled. The proposed method not only makes full use of space structure information, but also solves the problem that some data may be classified incorrectly. Thus, the classification accuracy of algorithm is improved in a great measure. Extensive experiments on real datasets clearly illustrate the effectiveness of proposed method.

Key words: self-training, density peak, cut edge weight, hypothesis testing

卫丹妮，杨有龙，仇海全. 结合密度峰值和切边权值的自训练算法[J]. 计算机工程与应用, 2021, 57(2): 70-76.

WEI Danni, YANG Youlong, QIU Haiquan. Self-Training Algorithm Combining Density Peak and Cut Edge Weight[J]. Computer Engineering and Applications, 2021, 57(2): 70-76.

[1]	兰红，黄敏. 融合KNN优化的密度峰值和FCM聚类算法[J]. 计算机工程与应用, 2021, 57(9): 81-88.
[2]	李莉，纪欣沅，宋嵩. 回环软件缺陷数量预测模型[J]. 计算机工程与应用, 2021, 57(7): 158-163.
[3]	彭启慧，宣士斌，高卿. 分布的自动阈值密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(5): 71-78.
[4]	王芙银，张德生，张晓. 结合鲸鱼优化算法的自适应密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(3): 94-102.
[5]	丁松阳，田青云. Ball-Tree优化的密度峰值聚类算法[J]. 计算机工程与应用, 2021, 57(20): 90-96.
[6]	谭学敏，郭超. 半监督学习的运动想象脑电信号分类[J]. 计算机工程与应用, 2020, 56(3): 139-145.
[7]	贾露，张德生，吕端端. 物理学优化的密度峰值聚类算法[J]. 计算机工程与应用, 2020, 56(13): 47-53.
[8]	谭阳，唐德权，曹守富. 聚类混合型数据的密度峰值改进算法[J]. 计算机工程与应用, 2020, 56(12): 47-53.
[9]	贺亮1，王永程1，李赟1，褚衍杰1，沈超2. 基于Lindeberg-Feller定理的网络异常检测算法[J]. 计算机工程与应用, 2019, 55(4): 41-47.
[10]	王军华，李建军，李俊山，赖文达. 自适应快速搜索密度峰值聚类算法[J]. 计算机工程与应用, 2019, 55(24): 122-127.
[11]	马峻岩，张颖，李易，王瑾，张特. HA2：层次化的物联网感知设备固件异常分析技术[J]. 计算机工程与应用, 2019, 55(22): 60-68.
[12]	杜沛，程晓荣. 一种基于[K]近邻的比较密度峰值聚类算法[J]. 计算机工程与应用, 2019, 55(10): 161-168.
[13]	王洋１，张桂珠2. 自动确定聚类中心的密度峰值算法[J]. 计算机工程与应用, 2018, 54(8): 137-142.
[14]	薛小娜1，高淑萍1，彭弘铭2，吴会会1. 结合K近邻的改进密度峰值聚类算法[J]. 计算机工程与应用, 2018, 54(7): 36-43.
[15]	黎隽男，吕佳. 基于近邻密度和半监督KNN的集成自训练方法[J]. 计算机工程与应用, 2018, 54(20): 132-138.

结合密度峰值和切边权值的自训练算法

Self-Training Algorithm Combining Density Peak and Cut Edge Weight

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics