Study on Hierarchical Multi-Label Text Classification Method of MSML-BERT Model

doi:10.3778/j.issn.1002-8331.2111-0176

Abstract

Abstract: Hierarchical multi-label text classification is more challenging than ordinary multi-label text classification, since multiple labels of the text establish a tree-like hierarchy. Current methods use the same model structure to predict labels at different layers, ignoring their differences and diversity. They don’t model the hierarchical dependencies fully, resulting in poor prediction performance of labels at all layers, especially the lower-layer long-tail labels, and may lead to label inconsistency problems. In order to address the above problems, the multi-task learning architecture is introduced, and the MSML-BERT model is proposed. The model regards the label classification network of each layer in the label hierarchy as a learning task, and enhances the performance of tasks at all layers through the sharing and transfer of knowledge between tasks. Based on this, a multi-scale feature extraction module is designed to capture multi-scale and multi-grained features to form various knowledge required at different layers. Further, a multi-layer information propagation module is designed to fully model hierarchical dependencies and transfer knowledge in different layers to support lower-layer tasks. In this module, a hierarchical gating mechanism is designed to filter the knowledge flow among tasks in different layers. Extensive experiments are conducted on the RCV1-V2, NYT and WOS datasets, and the results reveal that the entire performance of this model, especially on the lower-layer long-tail labels, surpasses that of other prevailing models and maintains a low label inconsistency ratio.

Key words: hierarchical multi-label text classification, multi-task learning architecture, BERT, multi-scale feature extraction module, multi-layer information propagation module

摘要： 层级多标签文本分类相比普通的多标签文本分类更具有挑战性，因为文本的多个标签组织成树状的层次结构。当前方法使用相同的模型结构来预测不同层级的标签，忽略了它们之间的差异性和多样性。并且没有充分地建模层级依赖关系，造成各层级标签尤其是下层长尾标签的预测性能差，且会导致标签不一致性问题。为了解决以上问题，将多任务学习架构引入，提出了MSML-BERT模型。该模型将标签结构中每一层的标签分类网络视为一个学习任务，通过任务间知识的共享和传递，提高各层级任务的性能。基于此，设计了多尺度特征抽取模块，用于捕捉不同尺度和粒度的特征以形成不同层级需要的各种知识。进一步，设计了多层级信息传播模块，用于充分建模层级依赖，在不同层级之间传递知识，以帮助下层任务。在该模块中，设计了层次化门控机制，为了过滤不同层级任务之间的知识流动。在RCV1-V2、NYT和WOS数据集上进行了充分的实验，结果显示该模型的总体表现尤其是在下层长尾标签上的表现超过了其他主流模型，并且能维持较低的标签不一致比率。

关键词: 层级多标签文本分类, 多任务学习架构, BERT, 多尺度特征抽取模块, 多层级信息传播模块

HUANG Wei, LIU Guiquan. Study on Hierarchical Multi-Label Text Classification Method of MSML-BERT Model[J]. Computer Engineering and Applications, 2022, 58(15): 191-201.

黄伟, 刘贵全. MSML-BERT模型的层级多标签文本分类方法研究[J]. 计算机工程与应用, 2022, 58(15): 191-201.

References

[1] MINAEE S，KALCHBRENNER N，CAMBRIA E，et al.Deep learning—based text classification：acomprehensive review[J].ACM Computing Surveys，2021，54（3）：1-40.
[2] LI Q，PENG H，LI J，et al.A survey on text classification：from shallow to deep Learning[J].arXiv：2008.00364，2020.
[3] 王浩镔，胡平.采用多级特征的多标签长文本分类算法[J].计算机工程与应用，2021，57（15）：193-199.
WANG H B，HU P.Multi-label long text classification algorithm based on multi-level features[J].Computer Engineering and Applications，2021，57（15）：193-199.
[4] SILLA C N，FREITAS A.A survey of hierarchicalclassification across different application domain[J].Data Mining and Knowledge Discovery，2011，22（1）：31-72.
[5] MAO Y，TIAN J，HAN J，et al.Hierarchical text classification with reinforced label assignment[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing，Hong Kong，China，November 3-7，2019：455-455.
[6] GAO D，YANG W，ZHOU H，et al.Deep hierarchical classification for category prediction in e-commerce system[J].arXiv：2005.06692，2020.
[7] QU B，CONG G，LI C，et al.An evaluation of classification models for question topic categorization[J].Journal of the American Society for Information Science and Technology，2012，63（5）：889-903.
[8] TAN L，LI M Y，KOK S.E-commerce product categorization via machine translation[J].ACM Transactions on Management Information Systems，2020，11（3）：1-14.
[9] AGRAWAL R，GUPTA A，PRABHU Y，et al.Multi-label learning with millions of labels：recommending advertiser bid phrases for web pages[C]//Proceedings of the 22nd International Conference on World Wide Web，New York，NY，USA，2013：13-24.
[10] BARBEDO J G，LOPES A.Automatic genre classification of musical signals[J].EURASIP Journal on Advances in Signal Processing，2007：1-12.
[11] CAI L，HOFMANN T.Hierarchical document categorization with support vector machines[C]//Proceedings of the 13th ACM International Conference on Information and Knowledge Management，Washington，DC，USA，2004：78-87.
[12] VENS C，STRUYF J，SCHIETGAT L，et al.Decision trees for hierarchical multi-label classification[J].Machine Learning，2008，73：185-214.
[13] BANERJEE S，AKKAYA C，SORROSAL F P，et al.Hierarchical transfer learning for multi-label text classification[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics，Florence，Italy，July 28-August 2，2019：6295-6300.
[14] KOWSARI K，BROWN D E，HEIDARYSAFA H，et al.Hdltex：hierarchical deep learning for text classification[C]//16th IEEE International Conference on Machine Learning and Applications，Cancun，Mexico，December 18-21，2017：364-371.
[15] ZHOU J，MA C，LONG D，et al.Hierarchy-aware global model for hierarchical text classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics，July 5-10，2020：1106-1117.
[16] HUANG W，CHEN E，LIU Q.Hierarchical multi-label text classification：an attention-based recurrent network approach[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management，Beijing，China，November 3-7，2019：1051-1060.
[17] LAI S，XU L，LIU K，et al.Recurrent convolutional neuralnetworks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence，January，2015：2267-2273.
[18] KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing，Doha，Qatar，October，2014：1746-1751.
[19] SHEN T，ZHOU T，LONG G，et al.Bi-directional block self-attention for fast and memory-efficient sequence modeling[J].arXiv：1804.00857，2018.
[20] PENG H，LI J，WANG S.Hierarchical taxonomy-aware and attentional graph capsule RCNNS for large-scale multi-label text classification[J].IEEE Transactions on Knowledge and Data Engineering，2021，33（6）：2505-2519.
[21] LIU J，CHANG W C，WU Y，et al.Deep learning for extreme multi-label text classification[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval，August，2017：115-124.
[22] RUIZ M E，SRINIVASAN P.Hierarchical text categorization using neural networks[J].Information Retrieval，2002，5：87-118.
[23] BI W，KWOK J T.Multi-label classification on tree-and dag-structured hierarchies[C]//Proceedings of the 28th International Conference on Machine Learning，Bellevue，WA，USA，2011.
[24] CERRI R，BARROS R C，CARVALHO A C，et al.Reduction strategies for hierarchical multi-label classification in protein function prediction[J].BMC Bioinformatics，2016，17（1）：373.
[25] BORGES H B，NIEVOLA J C.Multi-label hierarchical classification using a competitive neural network for protein function prediction[C]//The 2012 International Joint Conference on Neural Networks，Brisbane，QLD，Australia，10-15 June，2012：1-8.
[26] CARUANA R.Multitask learning[J].Machine Learning，1997，28（1）：41-75.
[27] 史荧中，汪菊琴，许敏，等.正则化多任务学习的快速算法[J].计算机科学与探索，2017，11（6）：988-997.
SHI Y Z，WANG J Q，XU M，et al.Fast algorithm for regularized multi-task learning[J].Journal of Frontiers of Computer Science and Technology，2017，11（6）：988-997.
[28] YU Y，ZHANG L，SHEN J，et al.Seismic event detection via deep multi-task learning[C]//International Joint Conference on Neural Networks（IJCNN），2020.
[29] RUDER S.An overview of multi-task learning in deep neural networks[J].arXiv：1706.05098，2017.
[30] SINGLA K，CAN D，NARAYANAN S.A multi-task approach to learning multilingual representations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics，2018：214-220.
[31] CAO J，LI Y，ZHANG Z.Partially shared multi-task convolutional neural network with local constraint for face attribute learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：4290-4299.
[32] LI X，HUAN J.Interactions modeling in multi-task multi-view learning with consistent task diversity[C]//Proceedings of the 27th ACM International Conference on International and Knowledge Management，2018：853-861.
[33] YANG Z，MERRICK K，ABBASS H，et al.Multi-task deep reinforcement learning for continuous action control[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence，2017：3301-3307.
[34] DEVLIN J，CHANG M W，LEE K，et al.Bert：pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT 2019，Minneapolis，Minnesota，June 2-7，2019：4171-4186.
[35] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//31st Conference on Neural Information Processing Systems，Long Beach，CA，USA，2017.
[36] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Las Vegas，NV，USA，June 27-30，2016：770-778.
[37] BA J L，KIROS J R，HINTON G E.Layer normalization[J].arXiv：1607.06450，2016.
[38] MA C，KANG P，LIU X.Hierarchical gating networks for sequential recommendation[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining，Anchorage，AK，USA，August 4-8，2019：825-833.
[39] LEWIS D，YANG Y，ROSE T G，et al.Rcv1：a new benchmark collection for text categorization research[J].Journal of Machine Learning Research，2004，5：361-397.
[40] SANDHAUS E.The New York Times annotated corpus[EB/OL].（2008）[2021-11-10].https：//catalog.ldc.upenn.edu/LDC2008T19.