Review and Prospect of Multi-Label Text Classification Research

doi:10.3778/j.issn.1002-8331.2210-0446

Abstract

Abstract: Text classification（TC） is an important basic task in the field of natural language processing（NLP）, and multi-label text classification（MLTC） is an important branch of TC. In order to have a deep understanding of the field of multi-label text classification, the concept and process of multi-label text classification are introduced. In recent years, multi-label text classification methods are divided into traditional machine learning methods and deep learning methods. The commonly used data sets and evaluation indexes in the field of multi-label text classification are sorted out, and the advantages and problems of some multi-label text classification models are analyzed. The research directions of multi-label text classification：label correlation, specific label characteristics, category imbalance, label loss and label compression. Finally, the difficulties of multi-label text classification are summarized and the future development direction is prospected.

Key words: multi-label text classification, deep learning, label correlation, features label-specific, class imbalance

摘要： 文本分类（TC）是自然语言处理（NLP）领域的重要基础任务，多标签文本分类（MLTC）是TC的重要分支。为了对多标签文本分类领域进行深入了解，介绍了多标签文本分类的概念和流程。将近年来多标签文本分类方法划分为基于传统机器学习方法和基于深度学习方法，梳理了多标签文本分类领域常用的数据集和评价指标，分析了部分多标签文本分类模型的优势和存在问题。介绍了多标签文本分类的研究方向：标签相关性、特定标签特性、类别不平衡、标签丢失和标签压缩。对多标签文本分类的难点和未来的发展方向进行了总结展望。

关键词: 多标签文本分类, 深度学习, 标签相关性, 特定标签特性, 类别不平衡

ZHANG Wenfeng, XI Xuefeng, CUI Zhiming, ZOU Yichen, LUAN Jinquan. Review and Prospect of Multi-Label Text Classification Research[J]. Computer Engineering and Applications, 2023, 59(18): 28-48.

张文峰, 奚雪峰, 崔志明, 邹逸晨, 栾进权. 多标签文本分类研究回顾与展望[J]. 计算机工程与应用, 2023, 59(18): 28-48.

References

[1] TSOUMAKAS G，KATAKIS I.Multi-label classification：an overview[J].International Journal of Data Warehousing and Mining（IJDWM），2007，3：1-13.
[2] ZENG Q，ZHAO X，HU X，et al.Learning emotional word embeddings for sentiment analysis[J].Journal of Intelligent & Fuzzy Systems，2021，40：9515-9527.
[3] YAO L，MAO C，LUO Y.Graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2019，33：7370-7377.
[4] KALCHBRENNER N，GREFENSTETTE E，BLUNSOM P.A convolutional neural network for modelling sentences[J].arXiv：1404.2188，2014.
[5] LEE J Y，DERNONCOURT F.Sequential short-text classification with recurrent and convolutional neural networks[J].arXiv：1603.03827，2016.
[6] XIAO L，HUANG X，CHEN B，et al.Label-specific document representation for multi-label text classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing（EMNLP-IJCNLP），2019：466-475.
[7] YANG Y，REN G.HanLP-based technology function matrix construction on Chinese process patents[J].International Journal of Mobile Computing and Multimedia Communications（IJMCMC），2020，11：48-64.
[8] MIKOLOV T，CHEN K，CORRADO G，et al.Efficient estimation of word representations in vector space[J].arXiv：1301.3781，2013.
[9] PENNINGTON J，SOCHER R，MANNING C D.Glove：global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing（EMNLP），2014：1532-1543.
[10] RADFORD A，NARASIMHAN K，SALIMANS T，et al.Improving language understanding by generative pretraining[EB/OL].（2018）[2020-11-30].https：//s3-us-west-2.amazonaws.com/openaiassets/researchcovers/languageunsupervised/languageunderstandingpaper.pdf.
[11] DEVLIN J，CHANG M W，LEE K，et al.Bert：pre-training of deep bidirectional transformers for language understanding[J].arXiv：1810.04805，2018.
[12] QIU X，SUN T，XU Y，et al.Pre-trained models for natural language processing：a survey[J].Science China Technological Sciences，2020，63：1872-1897.
[13] GHOSH S，DESARKAR M S.Class specific TF-IDF boosting for short-text classification：application to short-texts generated during disasters[C]//Companion Proceedings of the The Web Conference 2018，2018：1629-1637.
[14] BOUTELL M R，LUO J，SHEN X，et al.Learning multi-label scene classification[J].Pattern Recognition，2004，37：1757-1771.
[15] GODBOLE S，SARAWAGI S.Discriminative methods for multi-labeled classification[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining，2004：22-30.
[16] ALVARES-CHERMAN E，METZ J，MONARD M C.Incorporating label dependency into the binary relevance framework for multi-label classification[J].Expert Systems with Applications，2012，39：1647-1655.
[17] TSOUMAKAS G，DIMOU A，SPYROMITROS E，et al.Correlation-based pruning of stacked binary relevance models for multi-label learning[C]//Proceedings of the 1st International Workshop on Learning from Multi-Label Data，2009：101-116.
[18] READ J，PFAHRINGER B，HOLMES G，et al.Classifier chains for multi-label classification[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases，2009：254-269.
[19] TSOUMAKAS G，VLAHAVAS I.Random k-labelsets：an ensemble method for multilabel classification[C]//European Conference on Machine Learning，2007：406-417.
[20] READ J，PFAHRINGER B，HOLMES G.Multi-label classification using ensembles of pruned sets[C]//2008 Eighth IEEE International Conference on Data Mining，2008：995-1000.
[21] HüLLERMEIER E，FüRNKRANZ J，CHENG W，et al.Label ranking by learning pairwise preferences[J].Artificial Intelligence，2008，172：1897-1916.
[22] FüRNKRANZ J，HüLLERMEIER E，LOZA MENCíA E，et al.Multilabel classification via calibrated label ranking[J].Machine Learning，2008，73：133-153.
[23] ZHANG M L，ZHOU Z H.ML-KNN：a lazy learning approach to multi-label learning[J].Pattern Recognition，2007，40：2038-2048.
[24] HUANG J，LI G，WANG S，et al.Categorizing social multimedia by neighborhood decision using local pairwise label correlation[C]//2014 IEEE International Conference on Data Mining Workshop，2014：913-920.
[25] CLARE A，KING R D.Knowledge discovery in multi-label phenotype data[C]//European Conference on Principles of Data Mining and Knowledge Discovery，2001：42-53.
[26] BLOCKEEL H，DE RAEDT L，RAMON J.Top-down induction of clustering trees[J].arXiv：cs/0011032，2000.
[27] ZHANG M L，ZHOU Z H.Multilabel neural networks with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering，2006，18：1338-1351.
[28] ELISSEEFF A，WESTON J.A kernel method for multi-labelled classification[C]//Advances in Neural Information Processing Systems，2001.
[29] KIM Y.Convolutional neural networks for sentence classification[J].arXiv：1408.5882，2014.
[30] LIU J，CHANG W C，WU Y，et al.Deep learning for extreme multi-label text classification[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval，2017：115-124.
[31] SHIMURA K，LI J，FUKUMOTO F.HFT-CNN：learning hierarchical category structure for multi-label short text categorization[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing，2018：811-816.
[32] YANG W，LI J，FUKUMOTO F，et al.HSCNN：a hybrid-siamese convolutional neural network for extremely imbalanced multi-label text classification[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing（EMNLP），2020：6716-6722.
[33] NAM J，LOZA MENCíA E，KIM H J，et al.Maximizing subset accuracy with recurrent neural networks in multi-label classification[C]//Advances in Neural Information Processing Systems，2017.
[34] CHEN G，YE D，XING Z，et al.Ensemble application of convolutional and recurrent neural networks for multi-label text categorization[C]//2017 International Joint Conference on Neural Networks（IJCNN），2017：2377-2383.
[35] YANG P，LUO F，MA S，et al.A deep reinforced sequence-to-set model for multi-label classification[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics，2019：5252-5258.
[36] LIN J，SU Q，YANG P，et al.Semantic-unit-based dilated convolution for multi-label text classification[J].arXiv：1808.08561，2018.
[37] YANG P，SUN X，LI W，et al.SGM：sequence generation model for multi-label classification[J].arXiv：1806.04822，2018.
[38] YANG Z，YANG D，DYER C，et al.Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies，2016：1480-1489.
[39] HONG M，WANG M，LUO L，et al.Combining gated recurrent unit and attention pooling for sentimental classification[C]//Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence，2018：99-104.
[40] LI Y，CAI Y，LEUNG H F，et al.Improving short text modeling by two-level attention networks for sentiment classification[C]//International Conference on Database Systems for Advanced Applications，2018：878-890.
[41] YOU R，ZHANG Z，WANG Z，et al.Attentionxml：label tree-based attention-aware deep model for high-performance extreme multi-label text classification[C]//Advances in Neural Information Processing Systems，2019.
[42] YAO C，CAI M.A novel optimized convolutional neural network based on attention pooling for text classification[C]//Journal of Physics：Conference Series，2021.
[43] XIAO Y，LI Y，YUAN J，et al.History-based attention in Seq2Seq model for multi-label text classification[J].Knowledge-Based Systems，2021，224：107094.
[44] LIU B，LIU X，REN H，et al.Text multi-label learning method based on label-aware attention and semantic dependency[J].Multimedia Tools and Applications，2022，81：7219-7237.
[45] SONG R，LIU Z，CHEN X，et al.Label prompt for multi-label text classification[J].Applied Intelligence，2022，53：8761-8775.
[46] CHANG W C，YU H F，ZHONG K，et al.Taming pretrained transformers for extreme multi-label text classification[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining，2020：3163-3171.
[47] GONG J，TENG Z，TENG Q，et al.Hierarchical graph transformer-based deep learning model for large-scale multi-label text classification[J].IEEE Access，2020，8：30885-30896.
[48] JIANG T，WANG D，SUN L，et al.Lightxml：transformer with dynamic negative sampling for high-performance extreme multi-label text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021：7987-7994.
[49] YE C，ZHANG L，HE Y，et al.Beyond text：incorporating metadata and label structure for multi-label document classification using heterogeneous graphs[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing，2021：3162-3171.
[50] CHEN Q，DU J，ALLOT A，et al.LitMC-BERT：transformer-based multi-label classification of biomedical literature with an application on COVID-19 literature curation[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics，2022，19（5）：2584-2595.
[51] ZHANG R，WANG Y S，YANG Y，et al.Exploiting local and global features in transformer-based extreme multi-label text classification[J].arXiv：2204.00933，2022.
[52] KIPF T N，WELLING M.Semi-supervised classification with graph convolutional networks[J].arXiv：1609.02907，2016.
[53] LIU P，QIU X，HUANG X.Recurrent neural network for text classification with multi-task learning[J].arXiv：1605.05101，2016.
[54] VELIKOVI P，CUCURULL G，CASANOVA A，et al.Graph attention networks[J].arXiv：1710.10903，2017.
[55] PAL A，SANKARASUBBU M，SELVAKUMAR M.Multi-label text classification using attention-based graph neural network[J].arXiv：2003.11644，2020.
[56] DING K，WANG J，LI J，et al.Be more with less：hypergraph attention networks for inductive text classification[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing（EMNLP），2020：4927-4936.
[57] ZONG D，SUN S.GNN-XML：graph neural networks for extreme multi-label text classification[J].arXiv：2012.05860，2020.
[58] ZHENG S，ZHOU J，MENG K，et al.Label-dividing gated graph neural network for hierarchical text classification[C]//2022 International Joint Conference on Neural Networks（IJCNN），2022：1-8.
[59] ZHOU C，SUN C，LIU Z，et al.A C-LSTM neural network for text classification[J].arXiv：1511.08630，2015.
[60] ZHANG R，LEE H，RADEV D.Dependency sensitive convolutional neural networks for modeling sentences and documents[C]//Proceedings of NAACL-HLT，2016：1512-1521.
[61] LIU W，PANG J，LI N，et al.Research on multi-label text classification method based on tALBERT-CNN[J].International Journal of Computational Intelligence Systems，2021，14：1-12.
[62] YAN Y，LIU F A，ZHUANG X，et al.An R-Transformer_
BiLSTM model based on attention for multi-label text classification[J].Neural Processing Letters，2022：1-24.
[63] XIAO L，ZHANG X，JING L，et al.Does head label help for long-tailed multi-label text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2021：14103-14111.
[64] ZHANG X，ZHANG Q W，YAN Z，et al.Enhancing label correlation feedback in multi-label text classification via multi-task learning[J].arXiv：2106.03103，2021.
[65] KHATAEI MARAGHEH H，GHAREHCHOPOGH F S，MAJIDZADEH K，et al.A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification[J].Mathematics，2022，10：488.
[66] GIBAJA E，VENTURA S.A tutorial on multilabel learning[J].ACM Computing Surveys（CSUR），2015，47：1-38.
[67] BAO J，WANG Y，CHENG Y.Asymmetry label correlation for multi-label learning[J].Applied Intelligence，2022，52：6093-6105.
[68] HUANG R，KANG L.Local positive and negative label correlation analysis with label awareness for multi-label classification[J].International Journal of Machine Learning and Cybernetics，2021，12：2659-2672.
[69] LI Y K，ZHANG M L，GENG X.Leveraging implicit relative labeling-importance information for effective multi-label learning[C]//2015 IEEE International Conference on Data Mining，2015：251-260.
[70] MENCíA E L，FURNKRANZ J.Pairwise learning of multilabel classifications with perceptrons[C]//2008 IEEE International Joint Conference on Neural Networks（IEEE World Congress on Computational Intelligence），2008：2899-2906.
[71] WU G，TIAN Y，LIU D.Cost-sensitive multi-label learning with positive and negative label pairwise correlations[J].Neural Networks，2018，108：411-423.
[72] XU H，XU L.Multi-label feature selection algorithm based on label pairwise ranking comparison transformation[C]//2017 International Joint Conference on Neural Networks（IJCNN），2017：1210-1217.
[73] ZHANG Y，ZHAO T，MIAO D，et al.Granular multilabel batch active learning with pairwise label correlation[J].IEEE Transactions on Systems，Man，and Cybernetics：Systems，2021，52：3079-3091.
[74] WANG R，YE S，LI K，et al.Bayesian network based label correlation analysis for multi-label classifier chain[J].Information Sciences，2021，554：256-275.
[75] HE Z F，YANG M，GAO Y，et al.Joint multi-label classification and label correlations with missing labels and feature selection[J].Knowledge-Based Systems，2019，163：145-158.
[76] JI S，TANG L，YU S，et al.Extracting shared subspace for multi-label classification[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining，2008：381-389.
[77] XU L，WANG Z，SHEN Z，et al.Learning low-rank label correlations for multi-label classification with missing labels[C]//2014 IEEE International Conference on Data Mining，2014：1067-1072.
[78] CHE X，CHEN D，MI J.A novel approach for learning label correlation with application to feature selection of multi-label data[J].Information Sciences，2020，512：795-812.
[79] LI Q，PENG X，QIAO Y，et al.Learning label correlations for multi-label image recognition with graph networks[J].Pattern Recognition Letters，2020，138：378-384.
[80] HUANG J，LI G，WANG S，et al.Multi-label classification by exploiting local positive and negative pairwise label correlation[J].Neurocomputing，2017，257：164-174.
[81] MA J，CHIU B C Y，CHOW T W.Multilabel classification with group-based mapping：a framework with local feature selection and local label correlation[J].IEEE Transactions on Cybernetics，2020，52（6）：4596-4610.
[82] ZHU Y，KWOK J T，ZHOU Z H.Multi-label learning with global and local label correlation[J].IEEE Transactions on Knowledge and Data Engineering，2017，30：1081-1094.
[83] YAN Y，LI S，ZHANG X，et al.k-Labelsets for Multimedia classification with global and local label correlation[C]//International Conference on Multimedia Modeling，2018：177-188.
[84] WENG W，WEI B，KE W，et al.Learning label-specific features with global and local label correlation for multi-label classification[J].Applied Intelligence，2022：1-17.
[85] LIU Y，CAO F.A relative labeling importance estimation algorithm based on global-local label correlations for multi-label learning[J].Applied Intelligence，2022：1-19.
[86] LIU L，ZHANG J，LI P，et al.A label correlation based weighting feature selection approach for multi-label data[C]//International Conference on Web-Age Information Management，2016：369-379.
[87] LEE J，KIM H，KIM N R，et al.An approach for multi-label classification by directed acyclic graph with label correlation maximization[J].Information Sciences，2016，351：101-114.
[88] HU Q，PEDRYCZ W，YU D，et al.Selecting discrete and continuous features based on neighborhood decision error minimization[J].IEEE Transactions on Systems，Man，and Cybernetics，Part B（Cybernetics），2009，40：137-150.
[89] CHEN Z，LI S，YE L，et al.Multi-label classification of legal text based on label embedding and capsule network[J].Applied Intelligence，2022，53：6873-6886.
[90] WANG K.Robust cross-view embedding with discriminant structure for multi-label classification[J].IEEE Access，2021，9：117596-117607.
[91] ZHANG M L，WU L.Lift：multi-label learning with label-specific features[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2014，37：107-120.
[92] HUANG J，LI G，HUANG Q，et al.Learning label specific features for multi-label classification[C]//2015 IEEE International Conference on Data Mining，2015：181-190.
[93] WENG W，LIN Y，WU S，et al.Multi-label learning based on label-specific features and local pairwise label correlation[J].Neurocomputing，2018，273：385-394.
[94] GUAN Y，LI W，ZHANG B，et al.Multi-label classification by formulating label-specific features from simultaneous instance level and feature level[J].Applied Intelligence，2021，51：3375-3390.
[95] FAN Y，CHEN B，HUANG W，et al.Multi-label feature selection based on label correlations and feature redundancy[J].Knowledge-Based Systems，2022，241：108256.
[96] WU Y，LIU J，YU X，et al.Neighborhood rough set based multi‐label feature selection with label correlation[J].Concurrency and Computation：Practice and Experience，2022，34：e7162.
[97] WENG W，CHEN Y N，CHEN C L，et al.Non-sparse label specific features selection for multi-label classification[J].Neurocomputing，2020，377：85-94.
[98] SUN L，YIN T，DING W，et al.Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy[J].IEEE Transactions on Fuzzy Systems，2021，30：1197-1211.
[99] HU L，GAO L，LI Y，et al.Feature-specific mutual information variation for multi-label feature selection[J].Information Sciences，2022，593：449-471.
[100] ZHANG M L，LI Y K，YANG H，et al.Towards class-imbalance aware multi-label learning[J].IEEE Transactions on Cybernetics，2020，52（6）：4459-4471.
[101] CHARTE F，RIVERA A，JESUS M J D，et al.A first approach to deal with imbalance in multi-label datasets[C]//International Conference on Hybrid Artificial Intelligence Systems，2013：150-160.
[102] PEREIRA R M，COSTA Y M，SILLA JR C N.MLTL：a multi-label approach for the tomek link undersampling algorithm[J].Neurocomputing，2020，383：95-105.
[103] CHARTE F，RIVERA A J，DEL JESUS M J，et al.MLSMOTE：approaching imbalanced multilabel learning through synthetic instance generation[J].Knowledge-Based Systems，2015，89：385-397.
[104] LIU B，TSOUMAKAS G.Synthetic oversampling of multi-label data based on local label distribution[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases，2019：180-193.
[105] LUO F F，GUO W Z，CHEN G L.Addressing imbalance in weakly supervised multi-label learning[J].IEEE Access，2019，7：37463-37472.
[106] RASTOGI R，KUMAR S.Discriminatory label-specific weights for multi-label learning with missing labels[J].Neural Processing Letters，2022，55：1397-1431.
[107] DENDAMRONGVIT S，VATEEKUL P，KUBAT M.Irrelevant attributes and imbalanced classes in multi-label text-categorization domains[J].Intelligent Data Analysis，2011，15：843-859.
[108] WU B，JIA F，LIU W，et al.Multi-label learning with missing labels using mixed dependency graphs[J].International Journal of Computer Vision，2018，126：875-896.
[109] TAHA A Y，TIUN S，RAHMAN A H A，et al.Unified graph-based missing label propagation method for multilabel text classification[J].Symmetry，2022，14：286.
[110] AI Q，LI F，LI X，et al.An improved MLTSVM using label-specific features with missing labels[J].Applied Intelligence，2022，53：8039-8060.
[111] SUN L，WANG T，DING W，et al.Two-stage-neighborhood‐based multilabel classification for incompletedata with missing labels[J].International Journal of Intelligent Systems，2022，37（10）：6773-6810.
[112] CAO L，XU J.A label compression coding approach through maximizing dependence between features and labels for multi-label classification[C]//2015 International Joint Conference on Neural Networks（IJCNN），2015：1-8.
[113] YU T，YU G，WANG J，et al.Partial multi-label learning using label compression[C]//2020 IEEE International Conference on Data Mining（ICDM），2020：761-770.
[114] YANG Y，ZHOU J，LIU J，et al.Epileptic seizure detection based on multi-synchrosqueezing transform and multi-label classification[C]//Signal and Information Processing，Networking and Computers，2023：1017-1024.