计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (23): 126-135.DOI: 10.3778/j.issn.1002-8331.2307-0115

• 模式识别与人工智能 • 上一篇    下一篇

基于知识图谱关系预测的阶梯式问句链生成模型

方昱龙,王华珍,汪晓凤,周浩,张恒彰   

  1. 1.华侨大学 计算机科学与技术学院,福建 厦门 361021
    2.华侨大学 华文教育研究院,福建 厦门 361021
  • 出版日期:2024-12-01 发布日期:2024-11-29

SQCG-KGLP:Stepped Question Chain Generator Based on Knowledge Graph Link Prediction

FANG Yulong, WANG Huazhen, WANG Xiaofeng, ZHOU Hao, ZHANG Hengzhang   

  1. 1.Department of Computer Science Technology, Huaqiao University, Xiamen, Fujian 361021, China
    2.Institute of Chinese Language Education, Huaqiao University, Xiamen, Fujian 361021, China
  • Online:2024-12-01 Published:2024-11-29

摘要: 研究旨在让机器自动生成符合人类提问习惯的阶梯式问句链,辅助智能阅读提问、智能客服、智能医疗问诊等领域对于自然语言理解与生成任务的实际需求,提出一种基于知识图谱关系预测的阶梯式问句链生成模型(stepped question chain generator based on knowledge graph link prediction,SQCG-KGLP)。SQCG-KGLP模型包括编码和解码两部分,在编码部分,对问句链的问句实体进行特征初始化,并通过特征融合方法获得问句链的融合头实体和待测尾实体的初始向量。将融合头实体和待测尾实体的初始向量送入SQCG-KGLP模型的Graph Attention图表示学习模块中,从而获得融合头实体和待测尾实体的表示向量。在解码部分,将融合头实体以及待测尾实体的表示向量输入到SQCG-KGLP模型的convKB模块中进行链接预测。在自建问句链数据集上进行不同跳数的实验,以MRR、Hits@1、Hits@3、Hits@10作为评测指标。结果表明,SQCG-KGLP算法均优于基线模型。当问句链生成迭代次数增加时,支持关系预测的知识图谱跳数随之增加,下一个节点问句语义信息表达的自然性会随之下降。此外,在现实应用场景中较难获得足量的高跳数问句训练数据集。目前,该研究仅能支持三跳以下的问句链生成。利用知识图谱的语义关联及知识图谱关系预测机制,生成的阶梯式问句链具有层层递进、由易到难、环环相扣的特点,使问句链的阶梯式逻辑质量更优。

关键词: 阶梯式问句链, 知识图谱, 链接预测, 智能生成

Abstract: The research aims to make the machine automatically generate the stepped question chain that conforms to the questioning habits of human beings, and assist the actual demand for natural language understanding and generation tasks in the fields of intelligent reading and questioning, intelligent customer service, intelligent medical inquiry, etc. A stepped question chain generator based on knowledge graph link prediction (SQCG-KGLP) is proposed. SQCG-KGLP model includes two parts: encoding and decoding. In the encoding part, feature initialization is carried out on the question entity of the question chain, and the initial vector of the fusion head entity and the tail entity of the question chain to be predicted is obtained by the feature fusion method. Then, the initial vector of the fusion head entity and favorite tail entity is sent into the Graph Attention representation learning module of SQCG-KGLP model, so as to obtain the representation vector of the fusion head entity and tail entity to be predicted. In the decoding part, the representation vector of the fusion head entity and the favorite tail entity is input into the convKB module of the SQCG-KGLP model for link prediction. Experiments with multi-hop question chain self-built dataset are carried out. MRR, Hits@1, Hits@3 and Hits@10 are taken as evaluation indexes. The results show that SQCG-KGLP model is superior to baseline models. When the number of iterations of question chain generation increases, the hop number of knowledge graph supporting relation prediction will increase, whereas the idiomaticity of semantic information expression of high-hop question will decrease. In addition, it is difficult to obtain sufficient high-hop question chain training dataset in practical application scenarios.Consequently, this research can only support the generation of question chains with less than three hops. Relying on the semantic relevance and knowledge graph interlinkage mechanism, the stepped question chain has the characteristics of relevance step by step, from easy to difficult, and interlinkage, which makes the question chain possess better stepwise logic.

Key words: stepped question chain, knowledge graph, link prediction, intelligent generation