计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (15): 191-201.DOI: 10.3778/j.issn.1002-8331.2111-0176

• 模式识别与人工智能 • 上一篇    下一篇

MSML-BERT模型的层级多标签文本分类方法研究

黄伟,刘贵全   

  1. 1.中国科学技术大学 大数据学院,合肥 230027
    2.中国科学技术大学 计算机科学与技术学院,合肥 230027
    3.中国科学技术大学 大数据分析与应用安徽省重点实验室,合肥 230027
  • 出版日期:2022-08-01 发布日期:2022-08-01

Study on Hierarchical Multi-Label Text Classification Method of MSML-BERT Model

HUANG Wei, LIU Guiquan   

  1. 1.School of Data Science, University of Science and Technology of China, Hefei 230027, China
    2.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
    3.Anhui Province Key Laboratory of Big Data Analysis and Application, University of Science and Technology of China, Hefei 230027, China
  • Online:2022-08-01 Published:2022-08-01

摘要: 层级多标签文本分类相比普通的多标签文本分类更具有挑战性,因为文本的多个标签组织成树状的层次结构。当前方法使用相同的模型结构来预测不同层级的标签,忽略了它们之间的差异性和多样性。并且没有充分地建模层级依赖关系,造成各层级标签尤其是下层长尾标签的预测性能差,且会导致标签不一致性问题。为了解决以上问题,将多任务学习架构引入,提出了MSML-BERT模型。该模型将标签结构中每一层的标签分类网络视为一个学习任务,通过任务间知识的共享和传递,提高各层级任务的性能。基于此,设计了多尺度特征抽取模块,用于捕捉不同尺度和粒度的特征以形成不同层级需要的各种知识。进一步,设计了多层级信息传播模块,用于充分建模层级依赖,在不同层级之间传递知识,以帮助下层任务。在该模块中,设计了层次化门控机制,为了过滤不同层级任务之间的知识流动。在RCV1-V2、NYT和WOS数据集上进行了充分的实验,结果显示该模型的总体表现尤其是在下层长尾标签上的表现超过了其他主流模型,并且能维持较低的标签不一致比率。

关键词: 层级多标签文本分类, 多任务学习架构, BERT, 多尺度特征抽取模块, 多层级信息传播模块

Abstract: Hierarchical multi-label text classification is more challenging than ordinary multi-label text classification, since multiple labels of the text establish a tree-like hierarchy. Current methods use the same model structure to predict labels at different layers, ignoring their differences and diversity. They don’t model the hierarchical dependencies fully, resulting in poor prediction performance of labels at all layers, especially the lower-layer long-tail labels, and may lead to label inconsistency problems. In order to address the above problems, the multi-task learning architecture is introduced, and the MSML-BERT model is proposed. The model regards the label classification network of each layer in the label hierarchy as a learning task, and enhances the performance of tasks at all layers through the sharing and transfer of knowledge between tasks. Based on this, a multi-scale feature extraction module is designed to capture multi-scale and multi-grained features to form various knowledge required at different layers. Further, a multi-layer information propagation module is designed to fully model hierarchical dependencies and transfer knowledge in different layers to support lower-layer tasks. In this module, a hierarchical gating mechanism is designed to filter the knowledge flow among tasks in different layers. Extensive experiments are conducted on the RCV1-V2, NYT and WOS datasets, and the results reveal that the entire performance of this model, especially on the lower-layer long-tail labels, surpasses that of other prevailing models and maintains a low label inconsistency ratio.

Key words: hierarchical multi-label text classification, multi-task learning architecture, BERT, multi-scale feature extraction module, multi-layer information propagation module