Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (10): 105-112.DOI: 10.3778/j.issn.1002-8331.2302-0237

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Hierarchical Label Text Classification Method with Deep Label Assisted Classification Task

CAO Yukun, WEI Ziyue, TANG Yijia, JIN Chengkun, LI Yunfeng   

  1. 1.School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China
    2.IT Center, COMAC Shanghai Aviation Industrial (Group) Co., Ltd., Shanghai 201203, China
  • Online:2024-05-15 Published:2024-05-15

深层次标签辅助分类任务的层次标签文本分类方法

曹渝昆,魏子越,唐艺嘉,金成坤,李云峰   

  1. 1.上海电力大学 计算机科学与技术学院,上海 201306
    2.中国商飞上海航空工业(集团)有限公司 信息中心,上海 201203

Abstract: Hierarchical label text classification is a challenging task in natural language processing, where each document needs to be correctly classified into multiple labels corresponding to a hierarchical structure. However, in the label set, the insufficient semantic information contained in the labels, along with the low number of documents classified into deep labels, inadequate training of deep-level labels leads to significant imbalance problems in label training. A two-channel hierarchical label text classification method with deep label assisted classification task (DLAC) is proposed to deal with the above challenges. The method proposes a deep-level label assisted classifier that effectively uses text features with deep-level labels corresponding to parent label nodes (i.e., rich features of shallow labels) to improve the classification performance of deep-level labels based on semantic enhancement of labels. Experimental results with eleven algorithms on three datasets demonstrate that the proposed model effectively improves the classification performance of deep-level labels and achieves better results.

Key words: hierarchical label text classification, label hierarchy, global label classification channel, deep label assisted classification channel

摘要: 层次标签文本分类是自然语言处理领域中一项具有挑战性的任务,每个文档需要被正确分类到对应具有层次结构的多个标签中。然而在标签集中,由于标签包含的语义信息不充分,同时被归类到深层次标签的文档数量过少,深层次标签训练不充分,导致显著的标签训练不平衡问题。基于此,提出了深层次标签辅助分类任务的层次标签文本分类方法(DLAC)。该方法提出了一种深层次标签辅助分类器,在标签语义增强的基础上有效利用文本特征与深层次标签对应的父标签结点(即浅层次标签的丰富特征)来提升深层次标签的分类性能。与11种算法在三个数据集上的对比实验结果表明,模型能够有效提升深层次标签的分类性能,并取得良好效果。

关键词: 层次标签文本分类, 标签层次结构, 全局标签分类通道, 深层次标签辅助分类通道