计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (15): 120-125.DOI: 10.3778/j.issn.1002-8331.1704-0022

• 模式识别与人工智能 • 上一篇    下一篇

面向选择类题型求解的相似问题发现研究

于  凤1,郑雨晴2,郑德权1,2,赵姗姗2   

  1. 1.哈尔滨商业大学 计算机与信息工程学院,哈尔滨 150028
    2.哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150001
  • 出版日期:2018-08-01 发布日期:2018-07-26

Research on similarity problem discovery for multiple choice questions solving

YU Feng1, ZHENG Yuqing2, ZHENG Dequan1,2, ZHAO Shanshan2   

  1. 1.School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
    2.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
  • Online:2018-08-01 Published:2018-07-26

摘要: 在人工智能火热的今天,智能解题逐渐成为一大研究热点。研究基于知识关联和推理的选择类问题求解,尝试解决问题理解和相似问题发现两方面内容。针对问题理解,使用TextRank和词性标注两种方法完成关键信息提取,并使用word2vec词聚类的结果完成关键信息扩展;针对相似问题发现,首先根据问题理解生成的关键信息,从已有题库中抽取候选问题集,然后结合word2vec生成的词向量分别使用基于BM25变体、词项向量加权、改进的编辑距离三种方法计算句子相似度,并根据相似度大小,确定答案选择,最终完成问题求解。在地理选择题解答的相关实验中,获得了最高75.88%的平均准确率,也验证了问题求解的可行性。

关键词: 问题求解, 词向量, 相似度计算, 问题发现

Abstract: Artificial Intelligence(AI) is more popular today, intelligence problem-solving, therefore, has gradually become a major research method. This paper mainly studies the multiple choice problem-solving on the basis of knowledge association and reasoning, tries to solve two problems of the question-comprehension and similar question discovery. According to question-comprehension, TextRank and Part-Of-Speech(POS) tagging are introduced in order to complete the key information extract, and word2vec word clustering result is used to complete the key information extension. As for similar problem-discovery, firstly, this paper has randomly select questions from question bank management system according to the key information which is generated by problem-comprehension. Then, sentence similarity is calculated by using BM25 variant, lexical item vector weighted and improved editing distance respectively, combining with word vector which generated by Word2vec. According to the index of similarity to confirm the answer, questions solved. In the related experiment of the geographical multiple choice, an average accuracy rate of 75.88% is obtained, the feasibility of the problem-solving is verified.

Key words: problem-solving, word2vec, similarity computing, problem discovery