Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (6): 92-100.DOI: 10.3778/j.issn.1002-8331.2109-0502

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Application of Label-Bias Network in Datasets with Noisy Labels

JIANG Qianyin, YU Zhi, LI Xiying   

  1. 1.Research Center of Intelligent Transportation System, School of Intelligent Systems Engineering, Sun Yat-Sen University, Guangzhou 510006, China
    2.Guangdong Provincial Key Laboratory of Intelligent Transportation System, Guangzhou 510006, China
    3.Key Laboratory of Video and Image Intelligent Analysis and Application Technology, Ministry of Public Security, Guangzhou 510006, China
  • Online:2023-03-15 Published:2023-03-15

标签差网络在噪声标签数据集中的应用

江倩殷,余志,李熙莹   

  1. 1.中山大学 智能工程学院 智能交通研究中心,广州 510006
    2.广东省智能交通系统重点实验室,广州 510006
    3.视频图像智能分析与应用技术公安部重点实验室,广州 510006

Abstract: Noisy labels are common in real-world datasets, which will affect the learning performance of deep neural networks(DNNs) seriously. In view of this phenomenon, a method of noisy label recognition and re-labeling based on label-bias learning is proposed. Two strategies of pseudo-label generation are designed to generate the artificial noisy dataset by the clean datasets from basic network, and label-bias vectors and label-bias matrixes of the artificial noisy dataset are calculated. After that, by using the full connection layers and single-row convolution kernels to strengthen the relevance between similar classes, the noise probabilities of sample data are learned directly by the noise learning networks of label-bias vector network and label-bias matrix network. Thresholds that are linearly related to the noise ratio are designed to distinguish the clean data and noisy data. The factors affecting the network performances, including pseudo-label generation strategy, network structure, and training iteration, are analyzed by experiments. Experiments on a public dataset show in the situations of varies noise ratio distributions, under the premise of stable precision and recall of clean data, the proposed method can significantly improve the precision and recall of noisy data, with the maximum improvement of 16.45% and 21.01% respectively.

Key words: noisy dataset, noisy label, label bias, noise learning, deep learning

摘要: 噪声标签在实际数据集中普遍存在,这将严重影响深度神经网络的学习效果。针对此问题,提出了一种基于标签差学习的噪声标签数据识别与数据再标记方法。该方法设计两种不同的伪标签生成策略,利用基础网络所识别的干净数据生成人工噪声数据集,并计算该数据集的标签差向量或标签差矩阵;以强化相似类别间的关联性为目标,利用全连接层与单行卷积核,设计标签差向量网络与标签差矩阵网络等两种噪声学习网络直接学习样本数据的噪声概率;设计与噪声率线性相关的阈值,对干净数据与噪声数据进行判断。通过设计实验,对包括伪标签生成策略、网络结构、训练迭代次数等影响网络识别性能的因素进行分析。在公开数据集上的测试表明,在多种噪声分布情况中,该算法在保持干净数据的准确率与召回率基本稳定的前提下,能显著提高噪声数据的准确率与召回率,提高幅度最大为16.45%及21.01%。

关键词: 噪声数据集, 噪声标签, 标签差, 噪声学习, 深度学习