计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (21): 287-296.DOI: 10.3778/j.issn.1002-8331.2407-0534

• 图形图像处理 • 上一篇    下一篇

基于边缘差分信息融合的多模态实时三维目标检测

张芷湉,赵红东,张珂,陈丹,李俨奇   

  1. 1.河北工业大学 电子信息工程学院,天津 300401
    2.河北工业大学 创新研究院(石家庄),石家庄 050299
  • 出版日期:2025-11-01 发布日期:2025-10-31

Multimodal Real-Time 3D Object Detection Based on Edge Differential Information Fusion

ZHANG Zhitian, ZHAO Hongdong, ZHANG Ke, CHEN Dan, LI Yanqi   

  1. 1.School of Electronic Information and Engineering, Hebei University of Technology, Tianjin 300401, China
    2.Innovation and Research Institute of Hebei University of Technology in Shijiazhuang, Shijiazhuang 050299, China
  • Online:2025-11-01 Published:2025-10-31

摘要: 基于多模态的三维目标检测可以利用点云的几何信息和图像的语义信息。针对多模态三维目标检测中存在的边缘信息无法充分利用、异构数据融合困难、推理速度慢等问题,提出一种高效的基于边缘差分信息融合的多模态实时三维目标检测算法(multimodal real-time 3D object detection based on edge differential information fusion, EDMR-Net)。在融合阶段提出了一个差分特征增强融合模块,通过扩散函数利用图像的差分信息增强点云语义表达以实现异构数据的互补,使用丰富的边缘信息和特征的稳定状态精准定位小目标;利用自适应上下文感知网络对多模态特征进行自适应权重分配,进一步细化多尺度上下文信息;为了提升模型对细节信息的捕获能力,在浅层特征中引入了多尺度交叉轴向注意力机制。在KITTI数据集上进行了大量的实验结果表明,所提方法在速度和准确性上都优于主流方法,有效解决了边缘信息利用不充分和多模态推理速度慢的问题,EDMR-Net在保证简单和中等难度检测性能的前提下大大提高了困难场景的检测性能。

关键词: 三维目标检测, 多模态融合, 边缘差分信息, 扩散函数, 注意力机制

Abstract: Multimodal 3D object detection makes full use of the geometric information of the point cloud and the semantic information of the image. Aiming at the problems in multimodal 3D object detection, such as the inability to make full use of edge information,the difficulty of heterogeneous data fusion and slow inference speed,an efficient multimodal 3D object detection algorithm (multimodal real-time 3D object detection based on edge differential information fusion, EDMR-Net) based on edge differential information is proposed. In the fusion stage, a differential feature enhancement fusion (DEF) module is proposed, which enhances the point cloud semantic expression by using the differential information of the image through the diffusion function to achieve the complementarity of heterogeneous data, and precisely locates the small objects using the rich edge information and the stead condition of the features; and the multimodal features are further refined with the multi-scale context information using the adaptive context awareness (ACA) network with the adaptive weight assignment. In order to enhance the model’s ability to capture detailed information, a multi-scale cross-axis attention mechanism is introduced into the shallow layer features. Experimental results on KITTI dataset show that the proposed method outperforms the mainstream methods in terms of speed and accuracy, effectively solves the problems of inadequate utilization of edge information and slow multimodal inference, and EDMR-Net greatly improves the detection performance for difficult scenes while guaranteeing the detection performance for easy and moderate levels.

Key words: 3D object detection, multimodal fusion, edge differential information, diffusion function, attention mechanism