计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 158-163.DOI: 10.3778/j.issn.1002-8331.2004-0096

• 模式识别与人工智能 • 上一篇    下一篇

基于Transformer编码器的语义相似度算法研究

乔伟涛,黄海燕,王珊   

  1. 华东理工大学 信息科学与工程学院,上海 200237
  • 出版日期:2021-07-15 发布日期:2021-07-14

Semantic Similarity Calculation Based on Transformer Encoder

QIAO Weitao, HUANG Haiyan, WANG Shan   

  1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Online:2021-07-15 Published:2021-07-14

摘要:

语义相似度计算旨在计算文本之间在语义层面的相似程度,是自然语言处理中一项重要的任务。针对现有的计算方法不能充分表示句子的语义特征的问题,提出基于Transformer编码器的语义特征抽取的模型TEAM,利用Transformer模型的上下文语义编码能力充分提取句子内的语义信息,对句子进行深层语义编码。此外,通过引入交互注意力机制,在编码两个句子时利用交互注意力机制提取两个句子之间关联的相似特征,使模型更擅长捕捉句子内部重要的语义信息,提高了模型对语义的理解和泛化能力。实验结果表明,该模型在英文和中文的语义相似度计算任务上能够提高结果的准确性,较已有方法表现出更好的效果。

关键词: 语义相似度, Transformer编码器, 交互注意力机制, 语义表示

Abstract:

The calculation of semantic similarity aims to calculate the similarity between texts at the semantic level, which is an important task in natural language processing. Aiming at the problem that the existing calculation methods cannot fully represent the semantic features of sentences, the model TEAM based on the Transformer encoder is proposed. It can extract the semantic information in sentences by using the contextual semantic encoding ability of the Transformer model. In addition, the interactive attention mechanism is introduced. When encoding two sentences the interactive attention mechanism is used to extract similar features between the two sentences, making the model better at capturing important semantic information within the sentence and improving the model’s understanding of semantics and generalization capabilities. The experimental results show that the model can improve the accuracy of the results on the semantic similarity calculation task of English and Chinese, and exhibit better results than existing methods.

Key words: semantic similarity, Transformer encoder, interactive attention machanism, semantic representation