计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (19): 94-98.DOI: 10.3778/j.issn.1002-8331.1911-0033

• 大数据与云计算 • 上一篇    下一篇

基于标签语义相似的动态多标签文本分类算法

姚佳奇,徐正国,燕继坤,熊钢,李智翔   

  1. 盲信号处理重点实验室,成都 610041
  • 出版日期:2020-10-01 发布日期:2020-09-29

Dynamic Multi-label Text Classification Algorithm Based on Label Semantic Similarity

YAO Jiaqi, XU Zhengguo, YAN Jikun, XIONG Gang, LI Zhixiang   

  1. National Key Laboratory of Science and Technology on Blind Signal Processing, Chengdu 610041, China
  • Online:2020-10-01 Published:2020-09-29

摘要:

针对标签随着时间变化的动态多标签文本分类问题,提出了一种基于标签语义相似的动态多标签文本分类算法。该算法在训练阶段,首先按照标签固定训练得到一个基于卷积神经网络的多标签文本分类器,然后以该分类器的倒数第二层的输出为文本的特征向量。由于该特征向量是在有标签训练得到的,因而相对于基于字符串即文本内容而言,该特征向量含有标签语义信息。在测试阶段,将测试文档输入训练阶段的多标签文本分类器获取相应的特征向量,然后计算相似性,同时乘以时间衰减因子修正,使得时间越近的文本具有较高的相似性。最后,采用最近邻算法分类。实验结果表明,该算法在处理动态多标签文本分类问题上具有较优的性能。

关键词: 动态多标签, 文本分类, 神经网络, 标签语义相似

Abstract:

To solve the problem of dynamic multi-label text classification with time-varying labels, a dynamic multi label text classification algorithm based on label semantic similarity is proposed. In the training phase, a multi-label text classifier based on convolutional neural network is trained, and then the output of the penultimate layer of the classifier is taken as the feature vector of the text. Because the feature vector is trained with labels, it contains label semantic information compared with the content-based feature vector. In the test phase, the test document is input into the multi label text classifier in the training phase to obtain the corresponding feature vector, and then the cosine similarity is calculated. At the same time, a time attenuation factor is added to make the recent text have a higher similarity value. Finally, the nearest neighbor algorithm is used for classification. The experimental results show that the proposed algorithm has better performance in dealing with dynamic multi-label text classification problem.

Key words: dynamic multi-label, text classification, neural networks, label semantic similarity