计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (18): 28-48.DOI: 10.3778/j.issn.1002-8331.2210-0446

• 热点与综述 • 上一篇    下一篇

多标签文本分类研究回顾与展望

张文峰,奚雪峰,崔志明,邹逸晨,栾进权   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215000
    2.苏州市虚拟现实智能交互及应用技术重点实验室,江苏 苏州 215000
    3.苏州智慧城市研究院,江苏 苏州 215000
  • 出版日期:2023-09-15 发布日期:2023-09-15

Review and Prospect of Multi-Label Text Classification Research

ZHANG Wenfeng, XI Xuefeng, CUI Zhiming, ZOU Yichen, LUAN Jinquan   

  1. 1.School of Electronic & Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215000, China
    2.Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou, Jiangsu 215000, China
    3.China Suzhou Smart City Research Institute, Suzhou, Jiangsu 215000, China
  • Online:2023-09-15 Published:2023-09-15

摘要: 文本分类(TC)是自然语言处理(NLP)领域的重要基础任务,多标签文本分类(MLTC)是TC的重要分支。为了对多标签文本分类领域进行深入了解,介绍了多标签文本分类的概念和流程。将近年来多标签文本分类方法划分为基于传统机器学习方法和基于深度学习方法,梳理了多标签文本分类领域常用的数据集和评价指标,分析了部分多标签文本分类模型的优势和存在问题。介绍了多标签文本分类的研究方向:标签相关性、特定标签特性、类别不平衡、标签丢失和标签压缩。对多标签文本分类的难点和未来的发展方向进行了总结展望。

关键词: 多标签文本分类, 深度学习, 标签相关性, 特定标签特性, 类别不平衡

Abstract: Text classification(TC) is an important basic task in the field of natural language processing(NLP), and multi-label text classification(MLTC) is an important branch of TC. In order to have a deep understanding of the field of multi-label text classification, the concept and process of multi-label text classification are introduced. In recent years, multi-label text classification methods are divided into traditional machine learning methods and deep learning methods. The commonly used data sets and evaluation indexes in the field of multi-label text classification are sorted out, and the advantages and problems of some multi-label text classification models are analyzed. The research directions of multi-label text classification:label correlation, specific label characteristics, category imbalance, label loss and label compression. Finally, the difficulties of multi-label text classification are summarized and the future development direction is prospected.

Key words: multi-label text classification, deep learning, label correlation, features label-specific, class imbalance