
Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (24): 216-227.DOI: 10.3778/j.issn.1002-8331.2409-0328
• Graphics and Image Processing • Previous Articles Next Articles
ZHENG Shangpo1, LIU Junfeng1, ZENG Jun2+, XU Shikang1, LIAO Dingding1
Online:2025-12-15
Published:2025-12-15
郑尚坡1,刘俊峰1,曾君2+,徐诗康1,廖丁丁1
ZHENG Shangpo, LIU Junfeng, ZENG Jun, XU Shikang, LIAO Dingding. Multispectral Object Detection Based on Cross-Modality Adaptive Fusion Network[J]. Computer Engineering and Applications, 2025, 61(24): 216-227.
郑尚坡, 刘俊峰, 曾君, 徐诗康, 廖丁丁. 基于跨模态自适应融合网络的多光谱目标检测[J]. 计算机工程与应用, 2025, 61(24): 216-227.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2409-0328
| [1] LEE W Y, JOVANOV L, PHILIPS W. Cross-modality attention and multimodal fusion transformer for pedestrian detection[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2023: 608-623. [2] 罗会兰, 彭珊, 陈鸿坤. 目标检测难点问题最新研究进展综述[J]. 计算机工程与应用, 2021, 57(5): 36-46. LUO H L, PENG S, CHEN H K. Review on latest research progress of challenging problems in object detection[J]. Computer Engineering and Applications, 2021, 57(5): 36-46. [3] 王腾, 张大伟, 王利琴, 等. 多模态特征自适应融合的虚假新闻检测[J]. 计算机工程与应用, 2024, 60(13): 102-112. WANG T, ZHANG D W, WANG L Q, et al. Multimodal feature adaptive fusion for fake news detection[J]. Computer Engineering and Applications, 2024, 60(13): 102-112. [4] 刘通, 高思洁, 聂为之. 基于多模态信息融合的多目标检测算法[J]. 激光与光电子学进展, 2022, 59(8): 339-348. LIU T, GAO S J, NIE W Z. Multitarget detection algorithm based on multimodal information fusion[J]. Laser & Optoelectronics Progress, 2022, 59(8): 339-348. [5] WANG Z S, WANG J Y, WU Y Y, et al. UNFusion: a unified multi-scale densely connected network for infrared and visible image fusion[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(6): 3360-3374. [6] 王文霞, 张文, 何凯. 基于双模态特征增强的目标检测算法研究与应用[J]. 激光与红外, 2023, 53(9): 1364-1374. WANG W X, ZHANG W, HE K. Research and application of object detection algorithm based on bimodal feature enhancement[J]. Laser & Infrared, 2023, 53(9): 1364-1374. [7] 杨晨, 侯志强, 李新月, 等. 基于CNN-Transformer双模态特征融合的目标检测算法[J]. 光子学报, 2024, 53(3): 280-293. YANG C, HOU Z Q, LI X Y, et al. Object detection algorithm based on CNN-Transformer dual modal feature fusion[J]. Acta Photonica Sinica, 2024, 53(3): 280-293. [8] WAGNER J, FISCHER V, HERMAN M, et al. Multispectral pedestrian detection using deep fusion convolutional neural networks[C]//Proceedings of the European Symposium on Artificial Neural Networks (ESANN), 2016, 587: 509-514. [9] CAO Z W, YANG H H, ZHAO J, et al. Attention fusion for one-stage multispectral pedestrian detection[J]. Sensors, 2021, 21(12): 4184. [10] ZHANG H, FROMONT E, LEFEVRE S, et al. Guided attentive feature fusion for multispectral pedestrian detection[C]//Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 72-80. [11] YANG X X, QIAN Y Q, ZHU H J, et al. BAANet: learning bi-directional adaptive attention gates for multispectral pedestrian detection[C]//Proceedings of the 2022 International Conference on Robotics and Automation. Piscataway: IEEE, 2022: 2920-2926. [12] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. [13] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. [14] ZHOU K L, CHEN L S, CAO X. Improving multispectral pedestrian detection by addressing modality imbalance problems[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 787-803. [15] WANG Z S, CHEN Y L, SHAO W Y, et al. SwinFuse: a residual swin transformer fusion network for infrared and visible images[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 5016412. [16] FANG Q Y, HAN D P, WANG Z K. Cross-modality fusion transformer for multispectral object detection[J]. arXiv:2111. 00273, 2021. [17] ZHAO Z X, BAI H W, ZHANG J S, et al. CDDFuse: correlation-driven dual-branch feature decomposition for multi-modality image fusion[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 5906-5916. [18] LI Z, PAN H, ZHANG K, et al. MambaDFuse: a Mamba-based dual-phase model for multi-modality image fusion[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [19] LIU J J, ZHANG S T, WANG S, et al. Multispectral deep neural networks for pedestrian detection[J]. arXiv:1611. 02644, 2016. [20] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. [21] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2016: 1440-1448. [22] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [23] HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. [24] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. [25] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. [26] REDMON J, FARHADI A. YOLOv3: an incremental imp-rovement[J]. arXiv:1804.02767, 2018. [27] BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv: 2004.10934, 2020. [28] ULTRALYTICS. YOLOv5: end-to-end object detection with YOLO[EB/OL]. (2020) [2023-08-21]. https://github.com/ultr-alytics/yolov5. [29] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 213-229. [30] GUO J Y, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 12165-12175. [31] ZHANG L, LIU Z Y, ZHANG S F, et al. Cross-modality interactive attention network for multispectral pedestrian detection[J]. Information Fusion, 2019, 50: 20-29. [32] SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware lear-ning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713. [33] TANG W, HE F Z, LIU Y, et al. DATFuse: infrared and visible image fusion via dual attention transformer[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(7): 3159-3172. [34] YOU S, XIE X D, FENG Y J, et al. Multi-scale aggregation transformers for multispectral object detection[J]. IEEE Signal Processing Letters, 2023, 30: 1172-1176. [35] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2018: 3-19. [36] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7794-7803. [37] YANG L, ZHANG R Y, LI L, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the 38th International Conference on Machine Learning (ICML), July 2021: 11863-11874. [38] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. [39] FLIR TEAM. Free FLIR thermal dataset for algorithm trai-ning [EB/OL]. [2024-08-15] https://www.flir.com/oem/adas/adas-dataset-form/. [40] JIA X Y, ZHU C, LI M Z, et al. LLVIP: a visible-infrared paired dataset for low-light vision[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2021: 3489-3497. [41] RAZAKARIVONY S, JURIE F. Vehicle detection in aerial imagery: a small target detection benchmark[J]. Journal of Visual Communication and Image Representation, 2016, 34: 187-203. [42] ZHANG H Y, WANG Y, DAYOUB F, et al. VarifocalNet: an IoU-aware dense object detector[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8510-8519. [43] ZHENG Z H, WANG P, REN D W, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2022, 52(8): 8574-8586. [44] JIANG X H, CAI W, YANG Z Y, et al. IARet: a lightweight multiscale infrared aerocraft recognition algorithm[J]. Arabian Journal for Science and Engineering, 2022, 47(2): 2289-2303. [45] ZHANG H, FROMONT E, LEFEVRE S, et al. Deep active learning from multispectral data through cross-modality prediction inconsistency[C]//Proceedings of the 2021 IEEE International Conference on Image Processing. Piscataway: IEEE, 2021: 449-453. [46] ZHANG H, FROMONT E, LEFEVRE S, et al. Multispectral fusion for object detection with cyclic fuse-and-refine blocks[C]//Proceedings of the 2020 IEEE International Conference on Image Processing. Piscataway: IEEE, 2020: 276-280. [47] ZHOU H, SUN M, REN X, et al. Visible-thermal image object detection via the combination of illumination conditions and temperature information[J]. Remote Sensing, 2021, 13(18): 3656. [48] CAO Y, BIN J C, HAMARI J, et al. Multimodal object dete-ction by channel switching and spatial attention[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2023: 403-411. [49] CHEN Y T, SHI J H, YE Z L, et al. Multimodal object det-ection viaProbabilistic ensembling[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2022: 139-158. [50] TANG L F, XIANG X Y, ZHANG H, et al. DIVFusion: darkness-free infrared and visible image fusion[J]. Inform-ation Fusion, 2023, 91: 477-493. [51] ZHANG J Q, LEI J, XIE W Y, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5605415. [52] DHANARAJ M, SHARMA M, SARKAR T, et al. Vehicle detection from multi-modal aerial imagery using YOLOv3 with mid-level fusion[C]//Proceedings of the Big Data II: Learning, Analytics, and Applications, 2020: 6. [53] FANG Q Y, WANG Z K. Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery[J]. Pattern Recognition, 2022, 130: 108786. [54] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 618-626. |
| [1] | PENG Xiaohong, DENG Feng, YU Yinghuai. Construction of Chinese Named Entity Recognition Dataset in Penaeus Vannamei Farming Field [J]. Computer Engineering and Applications, 2025, 61(9): 353-362. |
| [2] | XIANG Yiwei, JIANG Yu, WANG Qikai, LUO Rongrong. Research on Real-Time Transformer for Multi-Scale Feature Optimization in Drone Aerial Imaging [J]. Computer Engineering and Applications, 2025, 61(9): 221-229. |
| [3] | LUO Yuxuan, WU Gaochang, GAO Ming. Remote Sensing Image Super-Resolution Network with Adaptive Convolution and Lightweight Transformer [J]. Computer Engineering and Applications, 2025, 61(9): 263-276. |
| [4] | XING Suxia, LI Kexian, FANG Junze, GUO Zheng, ZHAO Shihang. Survey of Medical Image Segmentation in Deep Learning [J]. Computer Engineering and Applications, 2025, 61(7): 25-41. |
| [5] | JIANG Wangyu, WANG Le, YAO Yepeng, MAO Guojun. Multi-Scale Feature Aggregation Diffusion and Edge Information Enhancement Small Object Detection Algorithm [J]. Computer Engineering and Applications, 2025, 61(7): 105-116. |
| [6] | BAI Xuebing, CHE Jin, WU Jinman. Image Captioning Algorithm for Multi-Scale Features Fusion [J]. Computer Engineering and Applications, 2025, 61(7): 288-296. |
| [7] | GONG Xiaomei, ZHANG Yi, HU Shu. Target Tracking Algorithm with Feature Fusion and Transformer Based Model Predictor [J]. Computer Engineering and Applications, 2025, 61(6): 254-262. |
| [8] | DU Xiaogang, LU Wenjie, LEI Tao, WANG Yingbo. Low-Light Image Enhancement Using Brightness and Signal-to-Noise Ratio Guided Transformer [J]. Computer Engineering and Applications, 2025, 61(6): 263-272. |
| [9] | LI Xin, ZHANG Dan, GUO Xin, WANG Song, CHEN Enqing. Human Pose Estimation Based on Dual-Stream Fusion of CNN and Transformer [J]. Computer Engineering and Applications, 2025, 61(5): 187-199. |
| [10] | HUANG Shan, FAN Huijie, LIN Sen, CAO Jinghan, TANG Yandong. Feature Dynamic Library Based on Diffusion Method [J]. Computer Engineering and Applications, 2025, 61(5): 241-249. |
| [11] | JIN Jiali, YU Lu. Continual Image Captioning with Dynamic Token-Used Fusion Feature [J]. Computer Engineering and Applications, 2025, 61(4): 176-191. |
| [12] | WANG Weihang, ZHANG Yi. MLDAC:Multi-Task Dense Attention Computation Self-Supervised Few-Shot Semantic Segmentation Method [J]. Computer Engineering and Applications, 2025, 61(4): 211-221. |
| [13] | PAN Weilan, ZHANG Rongfen, LIU Yuhong, ZHANG Jiyou, SUN Long. Cross-Modal Transparent Object Segmentation Combining CNN-Transformer [J]. Computer Engineering and Applications, 2025, 61(4): 222-229. |
| [14] | JIANG Yuehan, CHEN Junjie, LI Hongjun. Review of Human Action Recognition Based on Skeletal Graph Neural Networks [J]. Computer Engineering and Applications, 2025, 61(3): 34-47. |
| [15] | FENG Xingyu, ZHU Linglong, ZHANG Yonghong, KAN Xi, CAO Haixiao, MA Guangyi. Change Detection Algorithm Based on Multilateral Feature Guided Aggregation Network [J]. Computer Engineering and Applications, 2025, 61(3): 264-274. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||