计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (14): 144-152.DOI: 10.3778/j.issn.1002-8331.2012-0124

• 模式识别与人工智能 • 上一篇    下一篇

近亲结点图编辑的Self-Training算法

刘学文,王继奎,杨正国,易纪海,李冰,聂飞平   

  1. 1.兰州财经大学 信息工程学院,兰州 730020
    2.西北工业大学 计算机学院,光学影像分析与学习中心,西安 710072
  • 出版日期:2022-07-15 发布日期:2022-07-15

Self-Training Algorithm with Editing Direct Relative Node Graph

LIU Xuewen, WANG Jikui, YANG Zhengguo, YI Jihai, LI Bing, NIE Feiping   

  1. 1.School of Information Engineering, Lanzhou University of Finance and Economics, Lanzhou 730020, China
    2.School of Computer Science, Center for Optical Imagery Analysis and Learning(OPTIMAL), Northwestern Polytechnical University, Xi’an 710072, China
  • Online:2022-07-15 Published:2022-07-15

摘要: Self-Training算法的性能很大程度上取决于高置信度样本的识别准确度。受DPC算法启发,利用密度峰值定义样本间的原型关系,并构造出近亲结点图这一新型数据结构。在此基础上,提出了一种近亲结点图编辑的Self-Training算法(self-training algorithm with editing direct relative node graph-DRNG)。DRNG采用假设检验的方法选择高置信度样本,将其加入有标签样本集进行迭代训练。因误分的高密度样本点对Self-Training算法的分类性能影响较大,所以,DRNG综合考虑距离和密度两个方面定义了近亲结点图中割边的非对称权重,增大了高密度点的割边权重,使其落在拒绝域外的概率增加,减小了因其误分类而产生的风险。为了验证DRNG的性能,在8个基准数据集上与类似算法进行对比实验,实验结果验证了DRNG的有效性。

关键词: 近亲结点图, 半监督分类, 密度峰值, 自训练

Abstract: The performance of Self-Training algorithm largely depends on recognition accuracy of high-confidence samples. Inspired by the DPC algorithm, it defines the prototype relationship between samples by density peak and constructs a new data structure named direct relative node graph. On this basis, a novel self-training algorithm with editing direct relative node graph(DRNG) is proposed. DRNG employs a hypothesis test method to select high-confidence samples, and then adds them to the labeled sample set for iterative training. Because misclassified high-density sample points have a greater impact on the classification performance of the Self-Training algorithm, DRNG considers both distance and density to define the asymmetric weight of the cut edge in the direct relative node graph, which increases the cut edge weight of high-density points and the probability of high-density points falling outside the rejection domain. As a consequence, DRNG reduces the risk of high-density points being misclassified. To verify the performance of the DRNG, comparative experiments are carried out with 4 state-of-the-art algorithms on 8 benchmark datasets. The experimental results verify the effectiveness of the DRNG.

Key words: direct relative node graph, semi-supervised classification, density peak, self-training