Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (21): 134-141.DOI: 10.3778/j.issn.1002-8331.2307-0129

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Fine-Grained Text Classification Based on Label Augmentation

GUO Ruiqiang, YANG Shilong, JIA Xiaowen, WEI Qianqiang   

  1. 1.College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
    2.Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security, Hebei Normal University, Shijiazhuang 050024, China
    3.Hebei Provincial Key Laboratory of Network & Information Security, Hebei Normal University, Shijiazhuang 050024, China
  • Online:2024-11-01 Published:2024-10-25

基于标签增强的细粒度文本分类

郭瑞强,杨世龙,贾晓文,魏谦强   

  1. 1.河北师范大学 计算机与网络空间安全学院,石家庄 050024
    2.河北师范大学 河北省供应链大数据分析与数据安全工程研究中心,石家庄 050024
    3.河北师范大学 河北省网络与信息安全重点实验室,石家庄 050024

Abstract: Text classification is an important branch of natural language processing, which aims to label data through training. However, the existing methods only consider the most obvious semantic relationship between the label and the text, and do not consider the additional semantic information of the label itself, which makes it difficult to improve the accuracy of text classification. To solve this problem, this paper proposes a label-enhanced fine-grained text classification model (FGTC), which interprets labels based on known information and enriches the semantic links between labels and documents. In addition, FGTC further models the sequence relationships of phrases in labels and adopts a fine-grained label attention method at the word level, fully mining the effective information of labels. Comparative experiments are conducted on four benchmark datasets, and the results show that the accuracy of proposed model in text classification tasks is effectively improved.

Key words: text classification, label interpretation, fine-grained label attattention

摘要: 文本分类是自然语言处理的一个重要分支,旨在通过训练给数据标注标签。但现有的方法仅仅考虑了标签和文本之间最浅显的语义关系,并没有考虑标签本身的额外语义信息,导致文本分类的准确率难以提升。针对此问题,提出一种基于标签增强的细粒度文本分类模型(FGTC),它根据已知信息对标签进行解释,丰富了标签和文档之间的语义联系。此外,FGTC进一步建模标签中短语的序列关系,并采用单词级别的细粒度标签注意力方法,充分挖掘了标签的有效信息。在四个基准数据集上进行了对比实验,结果表明,该模型在文本分类任务上的准确率得到有效提升。

关键词: 文本分类, 标签增强, 细粒度标签注意力