计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (1): 116-120.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

问答系统中问题模式分类与相似度计算方法

周建政1,谌志群2,李  治1,王荣波2,冯  凯2   

  1. 1.天格科技(杭州)有限公司,杭州 310005
    2.杭州电子科技大学 认知与智能计算研究所,杭州 310018
  • 出版日期:2014-01-01 发布日期:2013-12-30

Methods of questions pattern classification and similarity measure for question answering system

ZHOU Jianzheng1, CHEN Zhiqun2, LI Zhi1, WANG Rongbo2, FENG Kai2   

  1. 1.Tiange Technology(Hangzhou) Limited Company, Hangzhou 310005, China
    2.Institute of Cognitive and Intelligent Computing, Hangzhou Dianzi University, Hangzhou 310018, China
  • Online:2014-01-01 Published:2013-12-30

摘要: 基于FAQ库的限定域自动问答系统由于更具实用性而成为自然语言处理领域的研究热点,而问题之间的相似度计算是其中最关键的技术。现有的问句相似度计算技术在处理带有上下文情景描述的问题时效果较差。针对现有技术存在的问题,提出将用户问题分为简洁模式问题(SMQs)和情景模式问题(CMQs),并提出了基于规则的问题模式分类算法。在此基础上,进一步提出了综合考察情景相似度和问句相似度的情景模式问题(CMQs)相似度计算方法。实验结果表明,问题模式分类算法取得了90%以上的准确率和召回率,情景模式问题相似度计算方法在时间复杂度较低的情况下也取得了74.3%的正确率。

关键词: 相似度计算, 模式分类, 上下文信息, 问答系统

Abstract: At present, question answering system based on Frequently Asked Questions(FAQ) for restricted domains is a research focus in the field of natural language processing due to its practicality. The similarity measure between questions plays a very important role in one question answering system. The traditional questions similarity measure technologies have unsatisfactory effects for those questions with context information. A rule-based question pattern classification algorithm is proposed for dividing all questions into two categories: Simple Mode Questions(SMQs) and Context Mode Questions(CMQs). Then, a similarity measure method for CMQs is presented in which the similarities between context information and that between questions are combined together. The experimental results show that both precision and recall rate of the proposed question pattern classification method exceed 90%, and the accuracy of similarity measure for context mode questions reaches 74.3% with lower time complexity.

Key words: similarity measure, pattern classification, context information, question answering system