Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (17): 206-212.DOI: 10.3778/j.issn.1002-8331.2101-0465

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Open Domain Chinese Knowledge Based Question Answering Based on Feature Enhancement

LI Shuaichi, YANG Zhihao, WANG Xinlei, HAN Qinyu, LIN Hongfei   

  1. School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
  • Online:2022-09-01 Published:2022-09-01

基于特征增强的开放域知识库问答系统

李帅驰,杨志豪,王鑫雷,韩钦宇,林鸿飞   

  1. 大连理工大学 计算机科学与技术学院,辽宁 大连 116024

Abstract: Entity disambiguation and predicate matching are two important subtasks in the Chinese knowledge based question answering(CKBQA) system. Considering the huge number of entities and predicates in the open domain knowledge base and the differences in expression between the questions and the knowledge in knowledge base, a pipeline CKBQA system called BERT-CKBQA is proposed, which is based on BERT with feature enhancement to improve the two subtasks. A BERT-CRF model is used to recognize entity mention in the question and gets candidate entity set first. Then, the question and the candidate entities concatenated with predicate feature are taken to BERT-CNN model for entity disambiguation, and the candidate predicate set is generated according to the entity. Next, an attention  predicate feature enhanced BERT-BiLSTM-CNN model is proposed for predicate matching. Finally, the system determines the query path by combining the scores of the entity and predicate to retrieve the final answer. This method designs a CKBQA system for simple questions, and introduces the pre-training model with feature enhancement to improve the performance of subtasks. It achieves 88.75% averaged F1-score on the NLPCC-ICCPOL-2016KBQA dataset and further ensures the quality of question answering.

Key words: open domain knowledge based question answering, entity mention recognition, entity disambiguation, predicate matching, BERT, feature enhancement

摘要: 实体消歧和谓词匹配是中文知识库问答系统(CKBQA)中的两个核心任务。针对开放域知识库中实体和谓词数量巨大,且中文问句与知识库知识在表现形式上存在差异的问题,提出一种基于特征增强的BERT的流水线式问答系统(BERT-CKBQA),改进了上述两个子任务。采用BERT-CRF模型识别问句中提及的实体,得到候选实体集合。将问题和拼接谓词特征的候选实体输入BERT-CNN模型进行实体消歧。根据实体生成候选谓词集合,提出通过注意力机制引入答案实体谓词特征的BERT-BiLSTM-CNN模型进行谓词匹配。结合实体和谓词的得分确定查询路径来检索最终答案。该方法设计了一个中文简单问题的开放域知识库问答系统,引入预训练模型与谓词特征增强子任务特征以提升其性能,并在NLPCC-ICCPOL-2016KBQA 数据集上取得了88.75%的平均F1值,提高了系统的回答准确率。

关键词: 开放域知识库问答, 实体提及识别, 实体消歧, 谓词匹配, BERT, 特征增强