Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (19): 158-165.DOI: 10.3778/j.issn.1002-8331.2102-0221

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Text Similarity Analysis Algorithm Combining Attention Mechanism and MatchPyramid

DAI  Xiang, SUN Haichun, ZHU Rongchen, SUN Tianyang   

  1. School of Information Network Security, People’s Public Security University of China, Beijing 100038, China
  • Online:2022-10-01 Published:2022-10-01

联合注意力机制与MatchPyramid的文本相似度分析算法

代翔,孙海春,朱容辰,孙天杨   

  1. 中国人民公安大学 信息网络安全学院,北京 100038

Abstract: Text similarity analysis is the core task in the field of natural language processing, and text similarity analysis based on deep text matching model is the main idea of this task. Aiming at the shortcomings of traditional MatchPyramid model in text feature extraction, a text similarity analysis method based on enhanced MatchPyramid model is proposed. In order to reduce the computational complexity of the model, multi-head self-attention mechanism and mutual attention mechanism are added to the input encoding layer, and autoencoder is used to reduce the dimension of the input word vector of dual attention mechanism. Then, the output of the dual attention mechanism is connected with the original word vector to improve the representation ability of the word vector to the key information of the text. Finally, the single channel graph formed by the dot product of the word vector matrix of two texts is mapped to multiple feature subspaces to form a multi-channel graph, and the dense connected convolutional neural network is used to extract the features of the multi-channel graph. The experimental results show that compared with the traditional MatchPyramid model, the accuracy of the proposed model is improved by 1.59?percentage points, and the F1 value is improved by 2.49 percentage points.

Key words: text similarity, attention mechanism, MatchPyramid, convolutional neural network

摘要: 文本相似度分析是自然语言处理领域的核心任务,基于深度文本匹配模型进行文本相似度分析是当前研究该任务的主流思路。针对传统的MatchPyramid模型对文本特征提取的不足之处进行改进,提出了基于增强MatchPyramid模型进行文本相似度分析的方法。该方法在输入编码层加入多头自注意力机制和互注意力机制,同时对双注意力机制的输入词向量使用自编码器做降维处理,以降低模型的计算量。接着将双注意力机制的输出与原始词向量相连接,提升了词向量对文本关键信息的表征能力。最后将两个文本的词向量矩阵点积形成的单通道图映射到多个特征子空间形成了多通道图,使用密集连接的卷积神经网络对多通道图进行特征提取。实验结果表明,相比于传统的MatchPyramid模型,所提出的模型准确率提升了1.59个百分点,F1值提升了2.49个百分点。

关键词: 文本相似度, 注意力机制, MatchPyramid, 卷积神经网络