计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (17): 233-242.DOI: 10.3778/j.issn.1002-8331.2305-0412

• 图形图像处理 • 上一篇    下一篇

多尺度特征融合的双模态目标检测方法

张睿,李允臣,王家宝,陈瑶,王梓祺,李阳   

  1. 陆军工程大学 指挥控制工程学院,南京 210007
  • 出版日期:2024-09-01 发布日期:2024-08-30

Multiscale Feature Fusion Approach for Dual-Modal Object Detection

ZHANG Rui, LI Yunchen, WANG Jiabao, CHEN Yao, WANG Ziqi, LI Yang   

  1. College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China
  • Online:2024-09-01 Published:2024-08-30

摘要: 基于可见光图像的目标检测,难以适应弱光、无光、强光等复杂光照条件,而基于红外图像的目标检测,受背景噪声影响大,且红外目标缺乏颜色信息,纹理细节特征弱,给目标检测带来较大挑战。对此,提出了一种能够有效融合可见光与红外图像特征的双模态目标检测方法。对输入的成对的双模态图像分别提取其初级特征;提出了多尺度特征注意力模块,对输入的红外与可见光图像分别提取其多尺度局部特征,并引入通道注意力和空间像素注意力,从通道和像素两个维度聚焦双模态图像的多尺度特征信息;提出双模态特征融合模块,对双模态特征信息进行自适应融合,得到双模态图像的多尺度融合特征。在大规模双模态图像数据集DroneVehicle上,与基准算法YOLOv5s利用可见光或红外单模态图像进行检测相比,所提算法检测精度分别提升了13.42和2.27个百分点,同时检测速度达到164?frame/s,具备端到端的实时检测能力。所提算法有效提高了复杂场景下目标检测的鲁棒性和准确性,具有良好的应用前景。

关键词: 目标检测, 多尺度特征融合, 双模态, 注意力机制

Abstract: Object detection based on visible images is difficult to adapt to complex lighting conditions such as low light, no light, strong light, etc., while object detection based on infrared images is greatly affected by background noise. Infrared objects lack color information and have weak texture  features, which pose a greater challenge. To address these problems, a dual-modal object detection approach that can effectively fuse the features of visible and infrared dual-modal images is proposed. A multiscale feature attention module is proposed, which can extract the multiscale features of the input IR and RGB images separately. Meanwhile, channel attention and spatial pixel attention is introduced to focus the multiscale feature information of dual-modal images from both channel and pixel dimensions. Finally, a dual-modal feature fusion module is proposed to adaptively fuse the feature information of dual-modal images. On the large-scale dual-modal image dataset DroneVehicle, compared with the benchmark algorithm YOLOv5s using visible or infrared single-modal image detection, the proposed algorithm improves the detection accuracy by 13.42 and 2.27 percentage points, and the detection speed reaches 164 frame/s, with ultra-real-time end-to-end detection capability. The proposed algorithm effectively improves the robustness and accuracy of object detection in complex scenes, which has good application prospects.

Key words: object detection, multiscale features fusion, dual-modal image, attention mechanism