含缺失标签的大规模多标签分类算法

doi:10.3778/j.issn.1002-8331.2112-0192

摘要/Abstract

摘要： 在对大规模多标签数据进行人工标注时极易产生标签的缺失。现有算法大多利用被所有实例共享的全局标签相关性来解决该问题，即对不同实例而言，标签之间的相关性是相同的。然而在实际应用中，不同实例的标签相关性并非完全相同，此时采用局部方式获取的标签相关性将更加准确。因此，本文提出一种基于局部标签相关性的解决方法。该方法利用局部标签相关性来恢复缺失标签，利用低秩矩阵分解技术来构造适用于大规模数据的分类器。此外，为了加快模型的训练，该方法将这两个过程融合到一个统一的框架中，并采用迭代优化的方式进行求解。大量实验结果表明，该方法在预测准确度上至少比现有算法高2个百分点，在训练速度上至少提升5个百分点。

关键词: 多标签分类, 缺失标签, 大规模标签, 局部标签相关性, 低秩矩阵分解

Abstract: It is easy to miss labels when labeling in large-scale multi-label data manually. Most of the existing algorithms use the global label correlations shared by all instances to solve this problem, that is, for all instances, the correlation between labels is the same. However, in practical applications, the label correlation in different instances is different, and the label correlation obtained by the local way is more accurate. Therefore, this paper proposes a solution based on local label correlation. The method exploits local label correlations to recover missing labels, and uses the low-rank matrix factorization to construct the classifier which suitable for large-scale data. Furthermore, to speed up the model training, the two processes are integrated into a unified framework and it is solved by iterative optimization. Extensive experimental results show that this method is at least 2 percentage points higher than existing algorithms in prediction accuracy and at least 5 percentage points higher in training speed.

Key words: multi-label classification, missing labels, large-scale labels, local label correlations, low-rank matrix factorization

刘依璐, 曹付元. 含缺失标签的大规模多标签分类算法[J]. 计算机工程与应用, 2022, 58(17): 148-157.

LIU Yilu, CAO Fuyuan. Large-Scale Multi-Label Classification Algorithm with Missing Labels[J]. Computer Engineering and Applications, 2022, 58(17): 148-157.

参考文献

[1] ZHOU F，HUANG S，XING Y.Deep semantic dictionary learning for multi-label image classification[C]//AAAI Conference on Artificial Intelligence，2021.
[2] 胡学钢，王博岩，李培培.多标签类重力密度和距离的图像注释方法[J].小型微型计算机系统，2017，38（7）：1619-1624.
HU X G，WANG B Y，LI P P.Image annotation approach based on the density and the distance of multi-label gravitation[J].Journal of Chinese Computer Systems，2017，38（7）：1619-1624.
[3] GONG J，LIU M，MA H，et al.Hierarchical graph transformer based deep learning model for large-scale multi-label text classification[J].IEEE Access，2020，8：30885-30896.
[4] 刘晓玲，刘柏嵩，王洋洋.一种基于图卷积网络的文本多标签学习方法[J].小型微型计算机系统，2021，42（3）：531-535.
LIU X L，LIU B S，WANG Y Y.Text multi-label learning method based on graph convolutional networks[J].Journal of Chinese Computer Systems，2021，42（3）：531-535.
[5] CHEN L，LI Z，ZENG T，et al.Predicting gene phenotype by multi-label multi-class model based on essential functional features[J].Molecular Genetics and Genomics，2021，296（4）：905-918.
[6] HE J，LI C，YE J，et al.Classification of ocular diseases employing attention-based unilateral and bilateral feature weighting and fusion[C]//Proceedings of 2020 IEEE 17th International Symposium on Biomedical Imaging（ISBI），Iowa City，IA，USA，April 3-7，2020.Piscataway：IEEE，2020：1258-1261.
[7] TAGAMI Y.AnnexML：approximate nearest neighbor search for extreme multi-label classification[C]//Proceedings of the 23rd ACM SIGKDD International Conference，Halifax，NS，Canada，August 13-17，2017.New York：ACM，2017：455-464.
[8] JALAN A，KAR P.Accelerating extreme classification via adaptive feature agglomeration[C]//Proceedings of IJCAI，Macao，China，August 10-16，2019：2600-2606.
[9] WADBUDE R，GUPTA V，RAI P，et al.Distributional semantics meets multi-label learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence，Honolulu，Hawaii，USA，January 27-February 1，2019：3747-3754.
[10] SHEN X，LIU W，TSANG I W，et al.Multi-label prediction via cross-view search[J].IEEE Transactions on Neural Networks and Learning Systems，2017，29（9）：4324-4338.
[11] TAN Q，YU Y，YU G，et al.Semi-supervised multi-label classification using incomplete label information[J].Neurocomputing，2017，260（10）：192-202.
[12] 王晶晶，杨有龙.针对弱标记数据的多标签分类算法[J].计算机工程与应用，2020，56（5）：65-73.
WANG J J，YANG Y L.Multi-label classification algorithm for weak labeled data[J].Computer Engineering and Applications，2020，56（5）：65-73.
[13] AKBARNEJAD A H，BAGHSHAH M S.An efficient large-scale semi-supervised multi-label classifier capable of handling missing labels[J].IEEE Transactions on Knowledge and Data Engineering，2019，31（2）：229-242.
[14] HUANG J，QIN F，ZHENG X，et al.Improving multi-label classification with missing labels by learning label-specific features[J].Information Sciences，2019，492（1）：124-146.
[15] RASTOGI R，MORTAZA S.Multi-label classification with missing labels using label correlation and robust structural learning[J].Knowledge-Based Systems，2021，229（9）：107336.
[16] SHEN X，LIU W，TSANG I W，et al.Multilabel prediction via cross-view search[J].IEEE Transactions on Neural Networks and Learning Systems，2018，29（9）：4324-4338.
[17] SI S，ZHANG H，KEERTHI S S，et al.Gradient boosted decision trees for high dimensional sparse output[C]//Proceedings of International Conference on Machine Learning，Sydney，NSW，Australia，August 6-11，2017：3182-3190.
[18] SIBLINI W，MEYER F，KUNTZ P.Craftml，an efficient clustering-based random forest for extreme multi-label learning[C]//Proceedings of International Conference on Machine Learning，Stockholm，Sweden，July 10-15，2018：4671-4680.
[19] BABBAR R，SHOELKOPF B.DisMEC-distributed sparse machines for extreme multi-label classification[C]//Proceedings of the Tenth International Conference on Web Search and Data Mining，Cambridge，United Kingdom，February 6-10，2017：721-729.
[20] BABBAR R，SCHOELKOPF B.Data scarcity，robustness and extreme multi-label classification[J].Machine Learning，2019，108（8/9）：1-23.
[21] XU C，TAO D C，XU C.Robust extreme multi-label learning[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，San Francisco，CA，USA，August 13-17，2016：1275-1284.
[22] XU M，NIU G，HAN B，et al.Matrix co-completion for multi-label classification with missing features and labels[J].arXiv：1805.09156，2018.
[23] YEH C K，WU W C，KO W J，et al.Learning deep latent space for multi-label classification[C]//Proceedings of the 21st AAAI Conference on Artificial Intelligence，California，February 4-9，2017.California：AAAI，2017：2838-2844.
[24] WANG K.Robust embedding framework with dynamic hypergraph fusion for multi-label classification[C]//Proceedings of the 2019 IEEE International Conference on Multimedia and Expo（ICME），Shanghai，China，July 8-12，2019：982-987.
[25] ZHANG M L，ZHOU Z H.A review on multi-label learning algorithms[J].IEEE Transactions on Knowledge and Data Engineering，2014，26（8）：1819-1837.