计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (2): 161-168.DOI: 10.3778/j.issn.1002-8331.2007-0500

• 模式识别与人工智能 • 上一篇    下一篇

改进卷积神经网络的文本主题识别算法研究

邱宁佳,杨长庚,王鹏,任涛   

  1. 长春理工大学 计算机科学技术学院,长春 130022
  • 出版日期:2022-01-15 发布日期:2022-01-18

Research on Text Topic Recognition Algorithm Based on Improved Convolutional Neural Network

QIU Ningjia, YANG Changgeng, WANG Peng, REN Tao   

  1. College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China
  • Online:2022-01-15 Published:2022-01-18

摘要: 针对于传统方法中存在的文本特征表示能力差、模型主题识别准确率低等问题,提出一种融合SENet与卷积神经网络的文本主题识别方法。将每个词对应的Word2vec词向量与LDA主题向量进行融合,并依据词语对主题的贡献度完成文档加权向量化处理;构建SECNN主题识别模型,使用SENet对卷积层输出的特征图进行权值的重标定,依靠其提升重要特征并抑制无用特征的性能,高效地进行主题识别;使用FDA评估样本的类别表征能力,提出FDA-SGD算法对模型参数进行调优,完成文本主题识别任务。使用新闻文本数据集验证改进算法的有效性,通过与传统模型对比表明,改进算法可以有效提高模型的收敛速度,具有较好的主题识别能力。

关键词: 主题识别, SENet, 卷积神经网络, Word2vec, 隐含狄利克雷分布(LDA)

Abstract: Aiming at the problems of poor text feature representation ability and low model topic recognition accuracy in traditional methods, a text topic recognition method combining SENet and convolutional neural network is proposed. First, the Word2vec word vector corresponding to each word is fused with the LDA topic vector, and the document weighted vectorization process is completed according to the contribution of the word to the topic. Then the SECNN topic recognition model is constructed, and the SENet is used to perform the feature map output of the convolutional layer. The recalibration of weights relies on the performance of improving important features and suppressing useless features to efficiently perform topic identification. Finally, the FDA is used to evaluate the category representation ability of the sample, and the FDA-SGD algorithm is proposed to optimize the model parameters and complete the text topic recognition task. The news text data set is used to verify the effectiveness of the improved algorithm. The comparison with the traditional model shows that the improved algorithm can effectively improve the convergence speed of the model and has better topic recognition capabilities.

Key words: topic recognition, SENet, convolutional neural network, Word2vec, latent Dirichlet allocation(LDA)