Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (2): 198-203.

Previous Articles     Next Articles

Sentence similarity computing based on relation vector model

YIN Yaoming, ZHANG Dongzhan   

  1. School of Information Science and Engineering, Xiamen University, Xiamen, Fujian 361005, China
  • Online:2014-01-15 Published:2014-01-26

基于关系向量模型的句子相似度计算

殷耀明,张东站   

  1. 厦门大学 信息科学与技术学院,福建 厦门 361005

Abstract: Sentence similarity computation is very important in all fields of natural language process. Some of the traditional algorithms only compare sentences based on their surface form such as same words, sentence length, word order and do not consider the sentence deep-level semantic information, some methods considered the sentence semantics get an unsatisfactory performance on the algorithm practicality. Therefore, a relation vector model which taking into account the relationship of sentence structure and semantic information based on space vector model is presented, this model is composed of a mix between the key words of the sentence and the key words synonymous information, which reflects local structural component of the sentence as well as the correlation between the local structure and therefore better reflects the structure and semantics of the sentence. An algorithm of sentence similarity based on relation vector model is put forward. The algorithm is applied to the network news summary generation algorithm in order to avoid redundancy. The experimental results show that, compared with the algorithm which considers the word order and semantic, relation vector model algorithm not only improves the accuracy of sentence similarity calculation, the time complexity of calculation is also reduced.

Key words: sentence similarity, relation vector model, sentence syntax, sentence semantics

摘要: 句子相似度的计算在自然语言处理的各个领域占有很重要的地位,一些传统的计算方法只考虑句子的词形、句长、词序等表面信息,并没有考虑句子更深层次的语义信息,另一些考虑句子语义的方法在实用性上的表现不太理想。在空间向量模型的基础上提出了一种同时考虑句子结构和语义信息的关系向量模型,这种模型考虑了组成句子的关键词之间的搭配关系和关键词的同义信息,这些信息反应了句子的局部结构成分以及各局部之间的关联关系,因此更能体现句子的结构和语义信息。以关系向量模型为核心,提出了基于关系向量模型的句子相似度计算方法。同时将该算法应用到网络热点新闻自动摘要生成算法中,排除文摘中意思相近的句子从而避免文摘的冗余。实验结果表明,在考虑网络新闻中的句子相似度时,与考虑词序与语义的算法相比,关系向量模型算法不但提高了句子相似度计算的准确率,计算的时间复杂度也得到了降低。

关键词: 句子相似度, 关系向量模型, 句子语法, 句子语义