基于特征增强的开放域知识库问答系统

doi:10.3778/j.issn.1002-8331.2101-0465

摘要/Abstract

摘要： 实体消歧和谓词匹配是中文知识库问答系统（CKBQA）中的两个核心任务。针对开放域知识库中实体和谓词数量巨大，且中文问句与知识库知识在表现形式上存在差异的问题，提出一种基于特征增强的BERT的流水线式问答系统（BERT-CKBQA），改进了上述两个子任务。采用BERT-CRF模型识别问句中提及的实体，得到候选实体集合。将问题和拼接谓词特征的候选实体输入BERT-CNN模型进行实体消歧。根据实体生成候选谓词集合，提出通过注意力机制引入答案实体谓词特征的BERT-BiLSTM-CNN模型进行谓词匹配。结合实体和谓词的得分确定查询路径来检索最终答案。该方法设计了一个中文简单问题的开放域知识库问答系统，引入预训练模型与谓词特征增强子任务特征以提升其性能，并在NLPCC-ICCPOL-2016KBQA 数据集上取得了88.75%的平均F1值，提高了系统的回答准确率。

关键词: 开放域知识库问答, 实体提及识别, 实体消歧, 谓词匹配, BERT, 特征增强

Abstract: Entity disambiguation and predicate matching are two important subtasks in the Chinese knowledge based question answering（CKBQA） system. Considering the huge number of entities and predicates in the open domain knowledge base and the differences in expression between the questions and the knowledge in knowledge base, a pipeline CKBQA system called BERT-CKBQA is proposed, which is based on BERT with feature enhancement to improve the two subtasks. A BERT-CRF model is used to recognize entity mention in the question and gets candidate entity set first. Then, the question and the candidate entities concatenated with predicate feature are taken to BERT-CNN model for entity disambiguation, and the candidate predicate set is generated according to the entity. Next, an attention predicate feature enhanced BERT-BiLSTM-CNN model is proposed for predicate matching. Finally, the system determines the query path by combining the scores of the entity and predicate to retrieve the final answer. This method designs a CKBQA system for simple questions, and introduces the pre-training model with feature enhancement to improve the performance of subtasks. It achieves 88.75% averaged F1-score on the NLPCC-ICCPOL-2016KBQA dataset and further ensures the quality of question answering.

Key words: open domain knowledge based question answering, entity mention recognition, entity disambiguation, predicate matching, BERT, feature enhancement

李帅驰, 杨志豪, 王鑫雷, 韩钦宇, 林鸿飞. 基于特征增强的开放域知识库问答系统[J]. 计算机工程与应用, 2022, 58(17): 206-212.

LI Shuaichi, YANG Zhihao, WANG Xinlei, HAN Qinyu, LIN Hongfei. Open Domain Chinese Knowledge Based Question Answering Based on Feature Enhancement[J]. Computer Engineering and Applications, 2022, 58(17): 206-212.

参考文献

[1] BOLLACKER K，EVANS C，PARITOSH P，et al.Freebase：a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data，2008：1247-1250.
[2] SUCHANEK F M，KASNECI G，WEIKUM G.Yago：a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web，2007：697-706.
[3] AUER S，BIZER C，KOBILAROV G，et al.Dbpedia：a nucleus for a web of open data[M]//The semantic web.Berlin，Heidelberg：Springer，2007：722-735.
[4] DUAN N.Overview of the NLPCC-ICCPOL 2016 shared task：open domain Chinese question answering[M]//Natural language understanding and intelligent applications.Cham：Springer，2016：942-948.
[5] DEVLIN J，CHANG M W，LEE K，et al.BERT：pre-training of deep bidirectional transformers for language understanding[J].arXiv：1810.04805，2018.
[6] 王智悦，于清，王楠，等.基于知识图谱的智能问答研究综述[J].计算机工程与应用，2020，56（23）：1-11.
WANG Z Y，YU Q，WANG N，et al.Survey of intelligent question answering research based on knowledge graph[J].Computer Engineering and Applications，2020，56（23）：1-11.
[7] KWIATKOWKSI T，ZETTLEMOYER L，GOLDWATER S，et al.Inducing probabilistic CCG grammars from logical form with higher-order unification[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing，2010：1223-1233.
[8] LIANG P，JORDAN M I，KLEIN D.Learning dependency-based compositional semantics[J].Computational Linguistics，2013，39（2）：389-446.
[9] YAO X，VAN D B.Information extraction over structured data：question answering with freebase[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics（Volume 1：Long Papers），2014：956-966.
[10] 王玥，张日崇.基于动态规划的知识库问答方法[J].郑州大学学报（理学版），2019，51（4）：37-42.
WANG Y，ZHANG R C.Knowledge base question answering method based on dynamic programming[J].Journal of Zhengzhou University（Science Edition），2019，51（4）：37-42.
[11] XIE Z W，ZENG Z，ZHOU G Y，et al.Topic enhanced deep structured semantic models for knowledge base question answering[J].Science China（Information Sciences），2017，60（11）：28-42.
[12] HAO Y，ZHANG Y，LIU K，et al.An end-to-end model for question answering over knowledge base with cross attention combining global knowledge[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics（Volume 1：Long Papers），2017：221-231.
[13] WANG L，ZHANG Y，LIU T.A deep learning approach for question answering over knowledge base[M]//Natural language understanding and intelligent applications.Cham：Springer，2016：885-892.
[14] LAI Y，LIN Y，CHEN J，et al.Open domain question answering system based on knowledge base[M]//Natural language understanding and intelligent applications.Cham：Springer，2016：722-733.
[15] LIU A，HUANG Z，LU H，et al.BB-KBQA：BERT-based knowledge base question answering[C]//China National Conference on Chinese Computational Linguistics.Cham：Springer，2019：81-92.
[16] 周博通，孙承杰，林磊，等.基于LSTM的大规模知识库自动问答[J].北京大学学报（自然科学版），2018，54（2）：286-292.
ZHOU B T，SUN C J，LIN L，et al.LSTM based question answering for large scale knowledge base[J].Acta Scientiarum Naturalium Universitatis Pekinensis，2018， 54（2）：286-292.
[17] YANG F，GAN L，LI A，et al.Combining deep learning with information retrieval for question answering[M]//Natural language understanding and intelligent applications.Cham：Springer，2016：917-925.