Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (31): 15-17.DOI: 10.3778/j.issn.1002-8331.2009.31.005

• 博士论坛 • Previous Articles     Next Articles

Sentence similarity measurement based on information category it contains

LI Lin,ZHOU Yi-min   

  1. School of Computer Science and Engineering,Beihang University,Beijing 100191,China
  • Received:2009-08-20 Revised:2009-09-11 Online:2009-11-01 Published:2009-11-01
  • Contact: LI Lin

传递信息分类的句子间相似性度量

李 林,周一民   

  1. 北京航空航天大学 计算机学院,北京 100191
  • 通讯作者: 李 林

Abstract: A method is proposed to determine English sentence similarities.Based on the information a sentence delivers:objects,properties and actions,the two compared sentences are chunked and further the above information is extracted.Then the similarities between objects,properties,and actions from the two sentences are calculated based on a semantic vector method.Finally the overall sentence similarity is defined as a combination of these three similarities by a parameter training method.Experiments show that the proposed method makes the sentence similarity comparison similar to the people’s comprehension to the meanings of the sentences and also achieves a better performance with a high accuracy.

Key words: sentence similarity, word semantic similarity, chunking, semantic vector

摘要: 提出了一种计算英文句子间相似度的方法。基于句子所传递的信息——其描述的对象、描述对象的属性和动作,首先将待比较的两个句子进行语块分析,并从中提取以上三个方面的信息;然后通过语义向量的方法,分别计算两个句子在这三个方面的相似度;最后将它们结合起来作为两个句子的整体相似度,并通过训练得到最优的结合参数。实验表明,提出的方法与目前计算句子间相似度的方法相比更加符合人工判断句子间相似度的过程,表现出更高的准确性,达到了较高的性能指标。

关键词: 句子相似度, 词汇语义相似度, 语块分析, 语义向量

CLC Number: