Self-Training Algorithm Combining Density Peak and Cut Edge Weight

doi:10.3778/j.issn.1002-8331.1912-0357

Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (2): 70-76.DOI: 10.3778/j.issn.1002-8331.1912-0357

Previous Articles Next Articles

Self-Training Algorithm Combining Density Peak and Cut Edge Weight

WEI Danni, YANG Youlong, QIU Haiquan

1.School of Mathematics and Statistics, Xidian University, Xi’an 710071, China
2.College of Information & Network Engineering, Anhui Science and Technology University, Bengbu, Anhui 233030, China

Online:2021-01-15 Published:2021-01-14

结合密度峰值和切边权值的自训练算法

卫丹妮，杨有龙，仇海全

1.西安电子科技大学数学与统计学院，西安 710071
2.安徽科技学院信息与网络工程学院，安徽蚌埠 233030

Abstract

Abstract:

In view of the influence of mislabeled samples on the performance of self-training algorithm in the process of iteration, a self-training algorithm based on density peak and cut edge weight is proposed. Firstly, the representative unlabeled samples are selected for labels prediction by space structure, which is discovered by clustering method based on density of data. Secondly, cut edge weight is used as statistics to make hypothesis testing. This technique is for identifying whether samples are labeled correctly. And then the set of labeled data is gradually enlarged until all unlabeled samples are labeled. The proposed method not only makes full use of space structure information, but also solves the problem that some data may be classified incorrectly. Thus, the classification accuracy of algorithm is improved in a great measure. Extensive experiments on real datasets clearly illustrate the effectiveness of proposed method.

Key words: self-training, density peak, cut edge weight, hypothesis testing

摘要：

针对自训练迭代过程中错误标记样本对算法性能的影响，提出了基于密度峰值和切边权值的自训练算法。用密度聚类方法发现数据集的空间结构，选出具有代表性的未标记样本进行标签预测。用切边权值作为统计量进行假设检验，判断样本是否被正确标记，进而用正确标记样本逐步扩充有标记样本集合，直至所有未标记样本标签预测完成。新算法既充分利用了样本数据的空间结构信息，又解决了部分样本被标记错误的问题，提高了算法的分类准确率。通过在真实数据集上实验验证了新算法的有效性。

关键词: 自训练, 密度峰值, 切边权值, 假设检验

WEI Danni, YANG Youlong, QIU Haiquan. Self-Training Algorithm Combining Density Peak and Cut Edge Weight[J]. Computer Engineering and Applications, 2021, 57(2): 70-76.

卫丹妮，杨有龙，仇海全. 结合密度峰值和切边权值的自训练算法[J]. 计算机工程与应用, 2021, 57(2): 70-76.

[1]	LAN Hong, HUANG Min. Fusion of KNN Optimized Density Peaks and FCM Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(9): 81-88.
[2]	LI Li, JI Xinyuan, SONG Song. Prediction Model for Number of Software Defects in Loop [J]. Computer Engineering and Applications, 2021, 57(7): 158-163.
[3]	PENG Qihui, XUAN Shibin, GAO Qing. Distribution Automatic Threshold Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2021, 57(5): 71-78.
[4]	WANG Fuyin, ZHANG Desheng, ZHANG Xiao. Adaptive Density Peaks Clustering Algorithm Combining with Whale Optimization Algorithm [J]. Computer Engineering and Applications, 2021, 57(3): 94-102.
[5]	DING Songyang, TIAN Qingyun. Density Peak Clustering Algorithm Based on Ball-Tree [J]. Computer Engineering and Applications, 2021, 57(20): 90-96.
[6]	TAN Xuemin, GUO Chao. Classification of Motor-Imagery-Based Brain Computer Interface of Semi-Supervised Learning [J]. Computer Engineering and Applications, 2020, 56(3): 139-145.
[7]	SUN Zhiran, SU Hang, LIANG Yi. Improved K-Prototypes Clustering Algorithm [J]. Computer Engineering and Applications, 2020, 56(21): 54-59.
[8]	WANG Pengyu, YOU Youpeng, YANG Xuefeng. Color Image Segmentation Based on Color Quantization and Density Peak Clustering [J]. Computer Engineering and Applications, 2020, 56(2): 211-215.
[9]	JIA Lu, ZHANG Desheng, LV Duanduan. Optimized Density Peak Clustering Algorithm in Physics [J]. Computer Engineering and Applications, 2020, 56(13): 47-53.
[10]	TAN Yang, TANG Dequan, CAO Shoufu. Density Peak Improvement Algorithm for Clustering Hybrid Data [J]. Computer Engineering and Applications, 2020, 56(12): 47-53.
[11]	WANG Junhua, LI Jianjun, LI Junshan, LAI Wenda. Adaptive Fast Search Density Peak Clustering Algorithm [J]. Computer Engineering and Applications, 2019, 55(24): 122-127.
[12]	MA Junyan, ZHANG Ying, LI Yi, WANG Jin, ZHANG Te. HA2：Hierarchical Anomaly Analysis Technology for IoT Sensing Device Firmware [J]. Computer Engineering and Applications, 2019, 55(22): 60-68.
[13]	GAO Yue, YANG Xiaofei, MA Yingcang, WANG Yirui. Density Peak Clustering Based on Shared [k]-Nearest Neighbors and Shared Reverse Nearest Neighbors [J]. Computer Engineering and Applications, 2019, 55(20): 43-51.
[14]	ZHU Qingfeng1，2, GE Hongwei1，2. Density Peaks Clustering Optimized by [K] Nearest Neighbor’s Similarity [J]. Computer Engineering and Applications, 2019, 55(2): 148-153.
[15]	DU Pei, CHENG Xiaorong. Comparative Density Peaks Clustering Based on [K]-Nearest Neighbors [J]. Computer Engineering and Applications, 2019, 55(10): 161-168.

Self-Training Algorithm Combining Density Peak and Cut Edge Weight

结合密度峰值和切边权值的自训练算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics