Chinese FAQ System Based on Sentence Similarity

Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (9): 161-163.

• 数据库与信息处理 • Previous Articles Next Articles

Chinese FAQ System Based on Sentence Similarity

Ye Zheng Hongfei Lin Yang Zhihao

Received:2006-07-26 Revised:1900-01-01 Online:2007-03-21 Published:2007-03-21
Contact: Hongfei Lin

基于问句相似度的中文FAQ问答系统研究

叶正林鸿飞杨志豪

大连理工大学计算机系大连理工大学计算机体系结构教研室

通讯作者: 林鸿飞

Abstract

Abstract: FAQ system is a QA retrieval system to find the question sentence that matched with the user question sentence to the set of "question-answer" pairs, and return its corresponding answer to user. Its key question is that questions asked by user and questions in the FAQ carry on similarity computation, discover the closest question in the FAQ and return the question answer stored in advance. This paper presented a question similarity computation approach based on splited vector space model and semantic concept according to the common question characteristic research. The main thought is that splitting a question vector, extracting the main three components—question point, keyword and interrogative, expressing three components, then computing semantic similarity to every component based on “Synonym Word Dictionary” and obtaining two semantic similarity of questions by the linear weighting. The experiment indicates that precision of question match can be improved compared to traditional question similarity computation based on TF-IDF computation of vector space model.

Key words: FAQ, Sentence Similarity, Semantic Similarity, Vector Space Model

摘要： 摘要：常见问题（FAQ）问答系统是一种在已有的“问题—答案”对集合中找到与用户提问相匹配的问句,并将其对应的答案返回给用户的问答式检索系统。其关键问题是用户提出问句与FAQ库中问句进行相似度计算，找出FAQ库中最相近的问句，并返回事先存储好的问题答案。本文通过对常见问句特点的研究，给出一种基于分解的向量空间模型和语义概念的问句相似度计算方法，其主要思想是对一个问句向量进行分解，提取其三个关键部分：问点，主题词和疑问词，表示成三个分向量，然后对每个分向量计算基于《HIT－IRLab同义词词林》的语义相似度，通过线性加权就可以得出两个问句的语义相似度。试验表明，与传统的基于向量空间模型的TF-DF问句相似度计算方法相比，可以提高问句匹配的精度。

关键词: 常见问题集, 问句相似度, 语义相似度, 向量空间模型

Ye Zheng Hongfei Lin Yang Zhihao. Chinese FAQ System Based on Sentence Similarity[J]. Computer Engineering and Applications, 2007, 43(9): 161-163.

叶正林鸿飞杨志豪. 基于问句相似度的中文FAQ问答系统研究[J]. 计算机工程与应用, 2007, 43(9): 161-163.

[1]	CAO Dongwei, LI Shaomei, CHEN Hongchang. Fake Reviews Detection Method Based on GCN [J]. Computer Engineering and Applications, 2022, 58(3): 181-186.
[2]	TANG Huanling, WEI Hongmin, WANG Yulin, ZHU Hui, DOU Quansheng. Text Semantic Enhancement Method Combining LDA and Word2vec [J]. Computer Engineering and Applications, 2022, 58(13): 135-145.
[3]	YANG Yanjiao, ZHAO Guotao, WANG Pidong. Sentence Similarity Calculation Method Based on Semantics and Emotion [J]. Computer Engineering and Applications, 2021, 57(16): 151-158.
[4]	SHI Chen, ZHANG Yu, HU Bo. Model for Near-Synonym/Synonym Phrase Finding Based on Common Surrounding Context [J]. Computer Engineering and Applications, 2021, 57(14): 142-147.
[5]	QIAO Weitao, HUANG Haiyan, WANG Shan. Semantic Similarity Calculation Based on Transformer Encoder [J]. Computer Engineering and Applications, 2021, 57(14): 158-163.
[6]	YUAN Zhongchen, MA Zongmin. Ensemble Classification for UML Class Diagram Based on Semantics [J]. Computer Engineering and Applications, 2021, 57(12): 257-262.
[7]	XU Ge, YANG Xiaoyan, WANG Tao. Survey on Semantic Similarity Calculation of Words [J]. Computer Engineering and Applications, 2020, 56(4): 9-15.
[8]	HAN Bang, LI Zichen, TANG Yongli. Design and Implementation of Full Text Retrieval Scheme Based on Homomorphic Encryption [J]. Computer Engineering and Applications, 2020, 56(21): 103-107.
[9]	YAO Jiaqi, XU Zhengguo, YAN Jikun, XIONG Gang, LI Zhixiang. Dynamic Multi-label Text Classification Algorithm Based on Label Semantic Similarity [J]. Computer Engineering and Applications, 2020, 56(19): 94-98.
[10]	YANG Quan, SUN Yuquan. Research on Semantic Similarity Calculation Based on Depth of CiLin [J]. Computer Engineering and Applications, 2020, 56(17): 48-54.
[11]	YE Xuemei1，2, MAO Xuemin1，2, XIA Jinchun1，2, WANG Bo1，2. Improved Approach to TF-IDF Algorithm in Text Classification [J]. Computer Engineering and Applications, 2019, 55(2): 104-109.
[12]	JI Mingyu, WANG Chenlong, AN Xiang, MU Weiye. Method of Sentence Similarity Calculation for Intelligent Customer Service [J]. Computer Engineering and Applications, 2019, 55(13): 123-128.
[13]	XIANG Guangli, LI Ankang, LIN Xiang, XIONG Bin. Multiple keywords retrieval scheme based on homomorphic encryption [J]. Computer Engineering and Applications, 2018, 54(2): 97-101.
[14]	HAN Xueren1, WANG Qingshan1, GUO Yong1, CUI Xingya2. Geographic ontology concept semantic similarity measure model based on BP neural network optimized by PSO [J]. Computer Engineering and Applications, 2017, 53(8): 32-37.
[15]	ZHANG Shaoyang, CAO Jiabo, WANG Zifan, QU Weidong. Chinese paragraph similarity calculated based on weighted bipartite graph match [J]. Computer Engineering and Applications, 2017, 53(18): 95-101.

Chinese FAQ System Based on Sentence Similarity

基于问句相似度的中文FAQ问答系统研究

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics