基于Transformer编码器的语义相似度算法研究

doi:10.3778/j.issn.1002-8331.2004-0096

计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (14): 158-163.DOI: 10.3778/j.issn.1002-8331.2004-0096

基于Transformer编码器的语义相似度算法研究

乔伟涛，黄海燕，王珊

华东理工大学信息科学与工程学院，上海 200237

出版日期:2021-07-15 发布日期:2021-07-14

Semantic Similarity Calculation Based on Transformer Encoder

QIAO Weitao, HUANG Haiyan, WANG Shan

School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China

Online:2021-07-15 Published:2021-07-14

摘要/Abstract

摘要：

语义相似度计算旨在计算文本之间在语义层面的相似程度，是自然语言处理中一项重要的任务。针对现有的计算方法不能充分表示句子的语义特征的问题，提出基于Transformer编码器的语义特征抽取的模型TEAM，利用Transformer模型的上下文语义编码能力充分提取句子内的语义信息，对句子进行深层语义编码。此外，通过引入交互注意力机制，在编码两个句子时利用交互注意力机制提取两个句子之间关联的相似特征，使模型更擅长捕捉句子内部重要的语义信息，提高了模型对语义的理解和泛化能力。实验结果表明，该模型在英文和中文的语义相似度计算任务上能够提高结果的准确性，较已有方法表现出更好的效果。

关键词: 语义相似度, Transformer编码器, 交互注意力机制, 语义表示

Abstract:

The calculation of semantic similarity aims to calculate the similarity between texts at the semantic level, which is an important task in natural language processing. Aiming at the problem that the existing calculation methods cannot fully represent the semantic features of sentences, the model TEAM based on the Transformer encoder is proposed. It can extract the semantic information in sentences by using the contextual semantic encoding ability of the Transformer model. In addition, the interactive attention mechanism is introduced. When encoding two sentences the interactive attention mechanism is used to extract similar features between the two sentences, making the model better at capturing important semantic information within the sentence and improving the model’s understanding of semantics and generalization capabilities. The experimental results show that the model can improve the accuracy of the results on the semantic similarity calculation task of English and Chinese, and exhibit better results than existing methods.

Key words: semantic similarity, Transformer encoder, interactive attention machanism, semantic representation

乔伟涛，黄海燕，王珊. 基于Transformer编码器的语义相似度算法研究[J]. 计算机工程与应用, 2021, 57(14): 158-163.

QIAO Weitao, HUANG Haiyan, WANG Shan. Semantic Similarity Calculation Based on Transformer Encoder[J]. Computer Engineering and Applications, 2021, 57(14): 158-163.

[1]	韩学仁1，王青山1，郭勇1，崔兴亚2. 基于PSO-BP算法的地理本体概念语义相似度度量[J]. 计算机工程与应用, 2017, 53(8): 32-37.
[2]	陈红阳，汪林林，鲁江坤，唐志，王飞雪. 基于双态模型的微博话题跟踪方法研究[J]. 计算机工程与应用, 2017, 53(16): 144-148.
[3]	邱云飞，赵彬，林明明，王伟. 结合语义改进的K-means短文本聚类算法[J]. 计算机工程与应用, 2016, 52(19): 78-83.
[4]	欧阳柳波，谭睿哲. 一种基于本体和用户日志的查询扩展方法[J]. 计算机工程与应用, 2015, 51(1): 151-155.
[5]	董丽丽1，李欢1，张翔1，刘闫锋2. 一种中文领域概念词自动提取方法研究[J]. 计算机工程与应用, 2014, 50(6): 127-131.
[6]	于洪志，夏建华，万福成，陈新一. 基于藏语句多特征融合的主观题自动评分算法[J]. 计算机工程与应用, 2014, 50(5): 216-220.
[7]	王旭阳，万里. 信息检索中语义相似度算法研究[J]. 计算机工程与应用, 2014, 50(10): 124-127.
[8]	张玉芳，张泓博，熊忠阳. 语义相似度计算在语义标注中的应用[J]. 计算机工程与应用, 2013, 49(4): 153-156.
[9]	赵晓玲，郭钢，董元发. 基于知识使用场景的知识推送方法研究[J]. 计算机工程与应用, 2013, 49(22): 132-135.
[10]	张乃静，鞠洪波，纪平. 基于本体的林业领域文档特征权重模型[J]. 计算机工程与应用, 2013, 49(18): 20-23.
[11]	谢岳山1，樊晓平1，廖志芳2，邱丽霞2. 频率相似度算法在审计规则库中的应用[J]. 计算机工程与应用, 2012, 48(28): 154-158.
[12]	焦芬芬. 基于概念和语义相似度的文本聚类算法[J]. 计算机工程与应用, 2012, 48(18): 136-141.
[13]	甘明鑫1，窦雪1，王道平1，江瑞2. 一种综合加权的本体概念语义相似度计算方法[J]. 计算机工程与应用, 2012, 48(17): 148-153.
[14]	金瑛浩1，2，孙立镌2. 语义特征建模系统中概念设计方案表示研究[J]. 计算机工程与应用, 2012, 48(13): 27-30.
[15]	胡运翠，林鸿飞，杨志豪. 语义相似度的基因名标准化方法[J]. 计算机工程与应用, 2011, 47(35): 128-131.

基于Transformer编码器的语义相似度算法研究

Semantic Similarity Calculation Based on Transformer Encoder

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics