计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (10): 48-56.DOI: 10.3778/j.issn.1002-8331.2101-0096

• 热点与综述 • 上一篇    下一篇

多标签文本分类研究进展

郝超,裘杭萍,孙毅,张超然   

  1. 陆军工程大学 指挥控制工程学院,南京 210007
  • 出版日期:2021-05-15 发布日期:2021-05-10

Research Progress of Multi-label Text Classification

HAO Chao, QIU Hangping, SUN Yi, ZHANG Chaoran   

  1. Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China
  • Online:2021-05-15 Published:2021-05-10

摘要:

文本分类作为自然语言处理中一个基本任务,在20世纪50年代就已经对其算法进行了研究,现在单标签文本分类算法已经趋向成熟,但是对于多标签文本分类的研究还有很大的提升空间。介绍了多标签文本分类的基本概念以及基本流程,包括数据集获取、文本预处理、模型训练和预测结果。介绍了多标签文本分类的方法。这些方法主要分为两大类:传统机器学习方法和基于深度学习的方法。传统机器学习方法主要包括问题转换方法和算法自适应方法。基于深度学习的方法是利用各种神经网络模型来处理多标签文本分类问题,根据模型结构,将其分为基于CNN结构、基于RNN结构和基于Transfomer结构的多标签文本分类方法。对多标签文本分类常用的数据集进行了梳理总结。对未来的发展趋势进行了分析与展望。

关键词: 自然语言处理, 多标签文本分类, 深度学习

Abstract:

As a basic task in natural language processing, text classification has been studied in the 1950s. Now the single-label text classification algorithm has matured, but there is still a lot of improvement on multi-label text classification. Firstly, the basic concepts and basic processes of multi-label text classification are introduced, including data set acquisition, text preprocessing, model training and prediction results. Secondly, the methods of multi-label text classification are introduced. These methods are mainly divided into two categories:traditional machine learning methods and the methods based on deep learning. Traditional machine learning methods mainly include problem transformation methods and algorithm adaptation methods. The methods based on deep learning use various neural network models to handle multi-label text classification problems. According to the model structure, they are divided into multi-label text classification methods based on CNN structure, RNN structure and Transfomer structure. The data sets commonly used in multi-label text classification are summarized. Finally, the future development trend is summarized and analyzed.

Key words: natural language processing, multi-label text classification, deep learning