计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (5): 65-73.DOI: 10.3778/j.issn.1002-8331.1903-0078

• 理论与研发 • 上一篇    下一篇

针对弱标记数据的多标签分类算法

王晶晶,杨有龙   

  1. 西安电子科技大学 数学与统计学院,西安 710126
  • 出版日期:2020-03-01 发布日期:2020-03-06

Multi-Label Classification Algorithm for Weak-Label Data

WANG Jingjing, YANG Youlong   

  1. School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
  • Online:2020-03-01 Published:2020-03-06

摘要:

针对标签信息不完整的多标签分类问题,一种新的多标签算法MCWD被提出。它通过有效地恢复训练数据中缺失的标签信息,能够产生更好的分类结果。在训练阶段,MCWD通过迭代更新每个训练实例的权重以及利用两两标签之间的相关性来恢复训练数据中缺失的标签信息;在标签恢复完毕后,利用新得到的训练集来训练分类模型;用此模型对测试集进行预测。实验结果表明,该算法在14个多标签数据集上具有一定的优势。

关键词: 多标签分类, 缺失标签, 弱标记学习, 标签相关性

Abstract:

For the problem of multi-label classification with incomplete label information, a new multi-label algorithm MCWD is proposed. By effectively recovering the missing label information in training data, it can produce better classification results. Firstly, in the training phase, MCWD recovers the missing label information in the training data by iteratively updating the weight of each training instance and utilizing the correlation between any two labels. Secondly, the new training set is used to train the classification model after the labels are recovered. Finally, the model is used to predict the testing set. Experimental results show that the algorithm has certain advantages on fourteen multi-label datasets.

Key words: multi-label classification, missing labels, weak label learning, label correlation