多尺度特征融合的双模态目标检测方法

doi:10.3778/j.issn.1002-8331.2305-0412

摘要/Abstract

摘要： 基于可见光图像的目标检测，难以适应弱光、无光、强光等复杂光照条件，而基于红外图像的目标检测，受背景噪声影响大，且红外目标缺乏颜色信息，纹理细节特征弱，给目标检测带来较大挑战。对此，提出了一种能够有效融合可见光与红外图像特征的双模态目标检测方法。对输入的成对的双模态图像分别提取其初级特征；提出了多尺度特征注意力模块，对输入的红外与可见光图像分别提取其多尺度局部特征，并引入通道注意力和空间像素注意力，从通道和像素两个维度聚焦双模态图像的多尺度特征信息；提出双模态特征融合模块，对双模态特征信息进行自适应融合，得到双模态图像的多尺度融合特征。在大规模双模态图像数据集DroneVehicle上，与基准算法YOLOv5s利用可见光或红外单模态图像进行检测相比，所提算法检测精度分别提升了13.42和2.27个百分点，同时检测速度达到164?frame/s，具备端到端的实时检测能力。所提算法有效提高了复杂场景下目标检测的鲁棒性和准确性，具有良好的应用前景。

关键词: 目标检测, 多尺度特征融合, 双模态, 注意力机制

Abstract: Object detection based on visible images is difficult to adapt to complex lighting conditions such as low light, no light, strong light, etc., while object detection based on infrared images is greatly affected by background noise. Infrared objects lack color information and have weak texture features, which pose a greater challenge. To address these problems, a dual-modal object detection approach that can effectively fuse the features of visible and infrared dual-modal images is proposed. A multiscale feature attention module is proposed, which can extract the multiscale features of the input IR and RGB images separately. Meanwhile, channel attention and spatial pixel attention is introduced to focus the multiscale feature information of dual-modal images from both channel and pixel dimensions. Finally, a dual-modal feature fusion module is proposed to adaptively fuse the feature information of dual-modal images. On the large-scale dual-modal image dataset DroneVehicle, compared with the benchmark algorithm YOLOv5s using visible or infrared single-modal image detection, the proposed algorithm improves the detection accuracy by 13.42 and 2.27 percentage points, and the detection speed reaches 164 frame/s, with ultra-real-time end-to-end detection capability. The proposed algorithm effectively improves the robustness and accuracy of object detection in complex scenes, which has good application prospects.

Key words: object detection, multiscale features fusion, dual-modal image, attention mechanism

张睿, 李允臣, 王家宝, 陈瑶, 王梓祺, 李阳. 多尺度特征融合的双模态目标检测方法[J]. 计算机工程与应用, 2024, 60(17): 233-242.

ZHANG Rui, LI Yunchen, WANG Jiabao, CHEN Yao, WANG Ziqi, LI Yang. Multiscale Feature Fusion Approach for Dual-Modal Object Detection[J]. Computer Engineering and Applications, 2024, 60(17): 233-242.

参考文献

[1] 杨祥, 王华彬, 董明刚. 改进YOLOv5的交通标志检测算法[J]. 计算机工程与应用, 2023, 59(13): 194-204.
YANG X, WANG H B, DONG M G. Improved traffic sign detection algorithm for YOLOv5[J]. Computer Engineering and Applications, 2023, 59(13): 194-204.
[2] 谢溥轩, 崔金荣, 赵敏. 基于改进YOLOv5的电动车头盔佩戴检测算法[J]. 计算机科学, 2023, 50(1): 420-425.
XIE P X, CUI J R, ZHAO M. Electiric bike helment wearing detection alogrithm based on improved YOLOv5[J]. Computer Science, 2023, 50(1): 420-425.
[3] 杨艳红, 钟宝江, 田宏伟. DS-YOLOv4-tiny救援机器人目标检测模型[J]. 计算机仿真, 2022, 39(1): 387-393.
YANG Y H, ZHONG B J, TIAN H W. Target detection model of DS-YOLOv4-tiny rescue robot[J]. Computer Simulation, 2022, 39(1): 387-393.
[4] GIRSHICK R B, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[5] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[6] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[7] LIN T Y, DOLLáR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936-944.
[8] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[9] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[10] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525.
[11] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[12] BOCHKOVSKIY A, WANG C Y, LIAO H Y. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[13] ULTRALYTICS. YOLOv5[EB/OL]. [2023-04-15]. https://github.com/ultralytics/yolov5.
[14] GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[15] LI C, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[16] WANG C Y, ALEXEY B, MARK L, et al. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv:2207.02696, 2022.
[17] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the European Conference on Computer Vision, 2016: 21-37.
[18] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[19] 徐坚, 谢正光, 李洪均. 特征平衡的无人机航拍图像目标检测算法[J]. 计算机工程与应用, 2023, 59(6): 196-203.
XU J, XIE Z G, LI H J. Feature-balanced UAV aerial image target detection algorithm[J]. Computer Engineering and Applications, 2023, 59(6): 196-203.
[20] 谢椿辉, 吴金明, 徐怀宇. 改进YOLOv5的无人机影像小目标检测算法[J]. 计算机工程与应用, 2023, 59(9): 198-206.
XIE C H, WU J M, XU H Y. Small object detection algorithm based on improved YOLOv5 in UAV image[J]. Computer Engineering and Applications, 2023, 59(9): 198-206.
[21] ZHANG X X, ZHU X. Vehicle detection in the aerial infrared images via an improved Yolov3 network[C]//Proceedings of the IEEE 4th International Conference on Signal and Image Processing, 2019: 372-376.
[22] 朱子健, 刘琪, 陈红芬, 等. 基于并行融合网络的航拍红外车辆小目标检测方法[J]. 光子学报, 2022, 51(2): 190-202.
ZHU Z J, LIU Q, CHEN H F, et al. Infrared small vehicle detection based on parallel fusion network[J]. Acta Photonica Sinica, 2022, 51(2): 190-202.
[23] GENG K K, ZOU W, YIN G D, et al. Low-observable targets detection for autonomous vehicles based on dual-modal sensor fusion with deep learning approach[J]. Journal of Automobile Engineering, 2019, 233(9): 2270-2283.
[24] SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32: 6700-6713.
[25] LIU J Y, FAN X, HUANG Z B, et al. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022: 5792-5801.
[26] ZHANG J Q, LEI J, XIE W Y, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-15.
[27] 彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660.
PENG H, LI X M. Multi-scale selection pyramid networks for small-sample target detection algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660.
[28] 贾天豪, 彭力, 戴菲菲. 引入残差学习与多尺度特征增强的目标检测器[J]. 计算机科学与探索, 2023, 17(5): 1102-1111.
JIA T H, PENG L, DAI F F. Object detector with residual learning and multi-scale feature enhancement[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1102-1111.
[29] LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[C]//Proceedings of the European Conference on Computer Vision, 2018: 404-419.
[30] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[31] 胡皓, 郭放, 刘钊. 改进YOLOX-S模型的施工场景目标检测[J]. 计算机科学与探索, 2023, 17(5): 1089-1101.
HU H, GUO F, LIU Z. Object detection based on improved YOLOX-S model in construction sites[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1089-1101.
[32] GUO M H, LU C Z, LIU Z N, et al. Visual attention network[J]. arXiv:2202.09741, 2022.
[33] ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network[C]//Proceedings of the Asian Conference on Computer Vision, 2022: 541-557.
[34] GEIRHOS R, RUBISCH P, MICHAELIS C, et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness[C]//Proceedings of the 7th International Conference on Learning Representations, 2019: 1223-1232.
[35] HU J, SHEN L, SUN G. Squeeze and excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.