Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (9): 161-163.

• 数据库与信息处理 • Previous Articles     Next Articles

Chinese FAQ System Based on Sentence Similarity

Ye Zheng Hongfei Lin Yang Zhihao   

  • Received:2006-07-26 Revised:1900-01-01 Online:2007-03-21 Published:2007-03-21
  • Contact: Hongfei Lin

基于问句相似度的中文FAQ问答系统研究

叶正 林鸿飞 杨志豪   

  1. 大连理工大学计算机系 大连理工大学计算机体系结构教研室
  • 通讯作者: 林鸿飞

Abstract: FAQ system is a QA retrieval system to find the question sentence that matched with the user question sentence to the set of "question-answer" pairs, and return its corresponding answer to user. Its key question is that questions asked by user and questions in the FAQ carry on similarity computation, discover the closest question in the FAQ and return the question answer stored in advance. This paper presented a question similarity computation approach based on splited vector space model and semantic concept according to the common question characteristic research. The main thought is that splitting a question vector, extracting the main three components—question point, keyword and interrogative, expressing three components, then computing semantic similarity to every component based on “Synonym Word Dictionary” and obtaining two semantic similarity of questions by the linear weighting. The experiment indicates that precision of question match can be improved compared to traditional question similarity computation based on TF-IDF computation of vector space model.

Key words: FAQ, Sentence Similarity, Semantic Similarity, Vector Space Model

摘要: 摘 要: 常见问题(FAQ)问答系统是一种在已有的“问题—答案”对集合中找到与用户提问相匹配的问句,并将其对应的答案返回给用户的问答式检索系统。其关键问题是用户提出问句与FAQ库中问句进行相似度计算,找出FAQ库中最相近的问句,并返回事先存储好的问题答案。本文通过对常见问句特点的研究,给出一种基于分解的向量空间模型和语义概念的问句相似度计算方法,其主要思想是对一个问句向量进行分解,提取其三个关键部分:问点,主题词和疑问词,表示成三个分向量,然后对每个分向量计算基于《HIT-IRLab同义词词林》的语义相似度,通过线性加权就可以得出两个问句的语义相似度。试验表明,与传统的基于向量空间模型的TF-DF问句相似度计算方法相比,可以提高问句匹配的精度。

关键词: 常见问题集, 问句相似度, 语义相似度, 向量空间模型