计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (18): 252-262.DOI: 10.3778/j.issn.1002-8331.2406-0113

• 图形图像处理 • 上一篇    下一篇

知识引导的图联合推理目标检测方法

谢斌红,王文博,张睿   

  1. 太原科技大学 计算机科学与技术学院,太原 030024
  • 出版日期:2025-09-15 发布日期:2025-09-15

Knowledge-Guided Graph Conjoint Reasoning Object Detection Method

XIE Binhong, WANG Wenbo, ZHANG Rui   

  1. College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Online:2025-09-15 Published:2025-09-15

摘要: 主流的目标检测方法通常局限于单独处理每个区域,忽视了重要的全局上下文信息和物体类别之间的关联。提出一种知识引导的图联合推理目标检测方法(knowledge-guided graph conjoint reasoning object detection method,GCRKG),其包括全局关系推理(global relational reasoning,GRR)模块和全局知识映射(global knowledge mapping,GKM)模块,旨在通过模仿人类推理过程来提高目标检测性能。GRR模块通过综合考虑类别的特征、共现和语义相关性知识之间的相对重要性,利用图联合注意力网络(graph conjoint attention networks,GCAT)完成类别关系推理。GKM模块利用多标签图像分类概率和目标检测分类器类别概率,将类别关联知识有效地映射到视觉区域。将映射特征与原始视觉区域特征做拼接增强,以预测出更合理的结果。在VOC和COCO两个数据集上与基线模型的对比结果表明了该方法的有效性和优越性。

关键词: 目标检测, 知识引导, 图联合注意力, 多标签图像分类

Abstract: Mainstream object detection methods typically handle each region in isolation, neglecting crucial global context information and inter-object category relationships. To this end, this paper proposes a knowledge-guided graph conjoint reasoning object detection method (GCRKG), which includes the global relational reasoning (GRR) module and the global knowledge mapping (GKM) module. This method aims to enhance detection performance by emulating the human reasoning process. Firstly, the GRR module employs graph conjoint attention networks (GCAT) to perform category relationship reasoning by comprehensively considering the relative importance of features, co-occurrence, and semantic relevance knowledge among categories. Secondly, the GKM module utilizes multi-label image classification probabilities and object detection classifier category probabilities to effectively map category relationship knowledge onto visual regions. Finally, the mapped features are concatenated with the original visual region features to enhance the prediction of more reasonable results. Comparative results with baseline models on the VOC and COCO datasets demonstrate the effectiveness and superiority of this method.

Key words: object detection, knowledge-guided, graph conjoint attention, multi-label image classification