计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (20): 103-110.DOI: 10.3778/j.issn.1002-8331.2209-0093

• 模式识别与人工智能 • 上一篇    下一篇

多尺度CNN卷积与全局关系的中文文本分类模型

宋中山,牛悦,郑禄,帖军,姜海   

  1. 1.中南民族大学 计算机科学学院,武汉 430070
    2.湖北省制造企业智能管理工程技术研究中心,武汉 430070
    3.农业区块链与智能管理湖北省工程研究中心,武汉 430070
  • 出版日期:2023-10-15 发布日期:2023-10-15

Multiscale Double-Layer Convolution and Global Feature Text Classification Model

SONG Zhongshan, NIU Yue, ZHENG Lu, TIE Jun, JIANG Hai   

  1. 1.College of Computer Science, South-Central Minzu University, Wuhan 430070, China
    2.Hubei Provincial Engineering Research Center for Intelligent Management of Manufacturing Enterprise, Wuhan 430070, China
    3.Hubei Provincial Engineering Research Center of Agricultural Blockchain and Intelligent Management, Wuhan 430070, China
  • Online:2023-10-15 Published:2023-10-15

摘要: 针对双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)和卷积神经网络(convolution neural network,CNN)因各自模型提取特征的局限性导致的分类准确率不高的问题,提出一种改进的双层CNN网络和引入注意力机制的BiLSTM联合模型。由于单层CNN网络获取局部特征能力有限,该模型通过对多尺度组合卷积引入上采样与原始文本进行跳跃连接来增强后续卷积的局部感受野,以达到增强特征提取丰富语义的目的。同时对BiLSTM模型引入注意力机制使之将注意力关注到重点单词特征上,将改进后的CNN网络和BiLSTM模型进行特征融合,最后输入全连接层进行多分类任务。模型在2个公开的中文新闻分类数据集上进行实验,实验结果显示所提出的模型在公开数据集THUCNews上的准确率为93.62%,比单通道普通CNN模型提高3.01个百分点,比CNN-LSTM双通道模型提高2.2个百分点。

关键词: 文本分类, 双向长短时记忆网络, 卷积神经网络, 注意力机制

Abstract: Aiming at bi-directional long short-term memory(BiLSTM) and convolution neural networks, due to the limitations of feature extraction of their respective models, the classification accuracy is not high. An improved two-layer CNN network and the introduction of attention mechanism BiLSTM joint model are proposed. Due to the limited ability of the single-layer CNN network to obtain local features, the model enhances the local sensitivity field of subsequent convolution by introducing up-sampling to the multi-scale combined convolution and skipping the original text to achieve the purpose of enhancing feature extraction and enriching semantics. At the same time, the attention mechanism is introduced into the BiLSTM model to make it pay attention to the features of key words, and the features of the improved CNN network and BiLSTM model are integrated. Finally, the full connection layer is input for the multi-classification task. The model is tested on two public Chinese news classification data sets. The experimental results show that the accuracy of the proposed model on the public data set THUCNews is 93.62%, 3.01 percentage points higher than that of the single channel common CNN model and 2.2 percentage points higher than that of the CNN-LSTM two-channel model.

Key words: text classification, bi-directional long short-term memory network(BiLSTM), convolutional neural network(CNN), attention mechanism