计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (22): 184-196.DOI: 10.3778/j.issn.1002-8331.2311-0459

• 模式识别与人工智能 • 上一篇    下一篇

预训练语言模型特征增强的多跳知识库问答

魏谦强,赵书良,卢丹琦,贾晓文,杨世龙   

  1. 1.河北师范大学 计算机与网络空间安全学院,石家庄 050024
    2.供应链大数据分析与数据安全河北省工程研究中心,石家庄 050024
    3.河北省网络与信息安全重点实验室,石家庄 050024
  • 出版日期:2024-11-15 发布日期:2024-11-14

Multi-Hop Knowledge Base Question Answering with Pre-Trained Language Model Feature Enhancement

WEI Qianqiang, ZHAO Shuliang, LU Danqi, JIA Xiaowen, YANG Shilong   

  1. 1.College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
    2.Hebei Provincial Engineering Research Center for Supply China Big Data Analytics & Data Security, Shijiazhuang 050024, China
    3.Hebei Provincial Key Laboratory of Network & Information Security, Shijiazhuang 050024, China
  • Online:2024-11-15 Published:2024-11-14

摘要: 知识库问答(knowledge base question answering,KBQA)是一个具有挑战性的热门研究方向,多跳知识库问答主要的挑战是非结构化的自然语言问题与结构化的知识库推理路径存在不一致性,基于图检索的多跳知识库问答模型善于把握图的拓扑结构,但忽略了图中结点和边携带的文本信息。为了充分学习知识库三元组的文本信息,构造了知识库三元组的文本形式,并提出了三个基于非图检索的特征增强模型RBERT、CBERT、GBERT,它们分别使用前馈神经网络、深层金字塔卷积网络、图注意力网络增强特征。三个模型显著提高了特征表示能力和问答准确率,其中RBERT结构最简单,CBERT训练最快,GBERT性能最优。在数据集MetaQA、WebQSP和CWQ上进行实验对比,在Hits@1和F1两个指标上三个模型明显优于目前的主流模型,也明显优于其他BERT的改进模型。

关键词: 多跳知识库问答, 预训练语言模型, 特征增强

Abstract: Knowledge base question answering (KBQA) is a challenging and popular research direction. The main challenge of multi-hop knowledge base question answering is the inconsistency between unstructured natural language questions and structured knowledge base reasoning paths. The multi-hop knowledge base question answering model based on graph retrieval is good at grasping the topological structure of the graph, but ignores the text information carried by the nodes and edges in the graph. In order to fully learn the text information of knowledge base triples, this paper constructs the text form of knowledge base triples and proposes three feature enhancement models RBERT, CBERT, and GBERT based on non-graph retrieval. The three feature models respectively use feedforward neural networks, deep pyramid convolutional networks, and graph attention networks to enhance features. The three models significantly improve feature representation capabilities and question and answer accuracy. RBERT has the simplest structure, CBERT is the fastest in training, and GBERT has the best performance. Experimental comparisons are conducted on the datasets MetaQA, WebQSP and CWQ, the three models are significantly better than the current mainstream models on the two indicators of Hits@1 and F1, and are also significantly better than other BERT improved models.

Key words: knowledge base question answering, pre-trained language model, feature enhancement