BSLA：改进Siamese-LSTM的文本相似模型

doi:10.3778/j.issn.1002-8331.2105-0220

摘要/Abstract

摘要： 针对Siamese-LSTM模型对相似文本特征提取能力差的问题，提出了一种改进Siamese-LSTM的文本相似模型，该方法引入注意力机制，对相似词分配更大的权重，增强了对文本中相似词的识别能力，同时又引入目前先进的预训练模型BERT，提高相似文本上下文中不同词的交互能力，加强词与词之间的关联度，从而实现对相似与不相似文本的识别。实验结果表明，与当前流行的文本相似模型Siamese-LSTM、ABCNN、ESIM，BIMPM和仅引入BERT模型或注意力机制的Siamese-LSTM模型相比，Siamese-LSTM同时融合BERT和Attention的文本相似模型在准确率、精确率、召回率和F1评价指标表现出了很好的效果，在LCQMC和Quora Question Pairs数据集上F1值分别达到了86.18%和89.08%的最佳效果。

关键词: Siamese-LSTM, 文本相似, 注意力机制, BERT

Abstract: Aiming at the problem that Siamese-LSTM model has poor ability to extract similar text features, an improved Siamese-LSTM text similarity model is proposed. This method introduces an attention mechanism to assign greater weight to similar words, and enhance the recognition ability of similar words in texts. At the same time, the current advanced pre-training model BERT is also introduced to improve the interaction ability of different words in the context of similar texts, and strengthen the correlation between words, so as to realize the recognition of similar and dissimilar texts. The experimental results show that compared with the current popular text similarity models such as Siamese-LSTM, ABCNN, ESIM and BIMPM, and the Siamese-LSTM model that only introduces the BERT model or attention mechanism, the text similarity model of the Siamese-LSTM that both combines BERT and Attention shows good results in accuracy, precision, recall rate and F1 evaluation index, and the F1 value reachs the best effect of 86.18% and 89.08% on the LCQMC data set and Quora Question Pairs data set, respectively.

Key words: Siamese-LSTM, text similarity, attention mechanism, BERT

孟金旭, 单鸿涛, 万俊杰, 贾仁祥. BSLA：改进Siamese-LSTM的文本相似模型[J]. 计算机工程与应用, 2022, 58(23): 178-185.

MENG Jinxu, SHAN Hongtao, WAN Junjie, JIA Renxiang. BSLA: Improved Text Similarity Model for Siamese-LSTM[J]. Computer Engineering and Applications, 2022, 58(23): 178-185.

参考文献

[1] AMIR S，TANASESCU A，ZIGHED D A.Sentence similarity based on semantic kernels for intelligent text retrieval[J].Journal of Intelligent Information Systems，2017，48（3）：675-689.
[2] DAS A，MANDAL J，DANIAL Z，et al.A novel approach for automatic bengali question answering system using semantic similarity analysis[J].arXiv：1910.10758，2019.
[3] VADAPALLI R，KURISINKEL L J，GUPTA M，et al.SSAS：semantic similarity for abstractive summarization[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing（Volume 2：Short Papers），2017：198-203.
[4] SHI B，YU P，ZHAO C J，et al.Linear correlation constrained joint inversion of seismic and gravity data using squared cosine similarity[J].Geophysical Journal International，2018，67（2）：292-295.
[5] KONDRAK G.N-gram similarity and distance[C]//String Processing and Information Retrieval，2005：115-126.
[6] SALTON G，WONG A，YANG C S，et al.A vector space model for automatic indexing[J].Communications of the ACM，1975，18（11）：613-620.
[7] NIWATTANAKUL S，SINGTHONGCHAI J，NAENUDORN E，et al.Using of Jaccard coefficient for keywords similarity[J].Lecture Notes in Engineering and Computer Science，2013，1（3）：13-15.
[8] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems，2012：1097-1105.
[9] SHEN Y，HE X，GAO J，et al.A latent semantic model with convolutional-pooling structure for information retrieval[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management，2014：101-110.
[10] HUANG P S，HE X，GAO J，et al.Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management，2013：2333-2338.
[11] HOCHREITER S，SCHMIDHUBER J.Long short-term memory[J].Neural Computation，1997，9（8）：1735-1780.
[12] PALANGI H，DENG L，SHEN Y，et al.Semantic modelling with long-short-term memory for information retrieval[J].arXiv：1412.6629，2014.
[13] WANG Z，HAMZA W，FLORIAN R.Bilateral multi-perspective matching for natural language sentences[C]//Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence Main Track，2017：4144-4150.
[14] GRAVES A，SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks，2005，18（5/6）：602-610.
[15] CHO K，VAN MERRIENBOER B，GULCEHRE C，et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv：1406. 1078，2014.
[16] 赵琪，杜彦辉，芦天亮，等.基于Capsule-BiGRU的文本相似度分析算法[J].计算机工程与应用，2021，57（15）：171-177.
ZHAO Q，DU Y H，LU T L，et al.Algorithm of text similarity analysis based on Capsule-BiGRU[J].Computer Engineering and Applications，2021，57（15）：171-177.
[17] 方炯焜，陈平华，廖文雄.结合GloVe和GRU的文本分类模型[J].计算机工程与应用，2020，56（20）：98-103.
FANG J K，CHEN P H，LIAO W X.Text classification model based on GloVe and GRU[J].Computer Engineering and Applications，2020，56（20）：98-103.
[18] PENNINGTON J，SOCHER R，MANNING C D.Glove：global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing（EMNLP），2014：1532-1543.
[19] YIN W，SCHüTZE H，XIANG B，et al.ABCNN：attention-based convolutional neural network for modeling sentence pairs[J].Transactions of the Association for Computational Linguistics，2016（4）：259-272.
[20] SEMENIUTA S，BARTH E.Image classification with recurrent attention models[C]//2016 IEEE Symposium Series on Computational Intelligence（SSCI），2016：1-7.
[21] CHEN Q，ZHU X D，LING Z H，et al.Enhanced LSTM for natural language inference[C]//Proceedings of IEEE Meeting on Association for Computational Linguistics.Washington DC，USA：IEEE Press，2017：1657-1668.
[22] MIKOLOV T，CHEN K，CORRADO G，et al.Efficient estimation of word representations in vector space[J].arXiv：1301.3781，2013.
[23] PETERS M E，NEUMANN M，IYYER M，et al.Deep contextualized word representations[J].arXiv：1802.05365，2018.
[24] RADFORD A，NARASIMHAN K，SALIMANS T，et al.Improving language understanding by generative pre-training[EB/OL].（2018-11-05）[2020-07-01].https：//s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.
[25] DEVLIN J，CHANG M W，LEE K，et al.BERT：pre-training of deep bidirectional transformers for language understanding[J].arXiv：1810.04805，2018.
[26] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//NIPS’17：Proceedings of the 31st International Conference on Neural Information Processing Systems，2017：5998-6008.