Low-Light Object Detection Combining Transformer and Dynamic Feature Fusion

doi:10.3778/j.issn.1002-8331.2310-0131

Abstract

Abstract: To address the issues of high parameter and computational complexity, poor real-time performance, and limited applicability to mobile devices in existing low-light object detection algorithms, this paper proposes an improved lightweight model called DarkYOLOv8 based on YOLOv8 for low-light object detection. Firstly, MobileNet v2 is replaced the backbone network of YOLOv8 to enhance the feature extraction capabilities of the model. Secondly, the Transformer attention mechanism is utilized to capture global information from the images and the Transformer module parameters are trained based on target annotation information as labels to enhance the weights within the target regions, thereby improving the capability of the model to extract target features under low-light conditions. Finally, the dynamic feature fusion attention (DFFA) module is employed for feature fusion in the neck network, dynamically fusing shallow and deep features, simultaneously, the YOLOv8X algorithm is employed in combination with CBAM to supervise the training of spatial attention weights in the CBAM module of DFFA. The experimental results show that on the ExDark dataset, DarkYOLOv8 achieves 70.1% on the mAP50 metric with only 8.53 GFLOPs, which is a 3.9 percentage points improvement compared to YOLOv8n.

Key words: low-light object detection, attention mechanism, lightweight, Transformer, deformable convolution

摘要： 针对现有低照度目标检测算法模型参数量与计算量大、检测实时性差、难以应用于移动设备等问题，提出一种基于YOLOv8的改进轻量模型DarkYOLOv8的低照度目标检测方法。使用MobileNet v2替换YOLOv8的主干网络，增加模型的特征提取能力；使用Transformer注意力机制，获取图像的全局信息，并且基于目标标记信息作为标签训练Transformer模块参数，增强目标区域内的权重，从而提高模型在低照度条件下提取目标特征的能力；对颈部网络使用动态特征融合注意力模块（dynamic feature fusion attention，DFFA），动态融合浅层和深层特征，同时使用YOLOv8X算法+CBAM对DFFA模块中CBAM空间注意力权重进行监督训练。实验结果表明，在ExDark数据集上，DarkYOLOv8在GFLOPs仅为8.53的情况下mAP50指标达到70.1%，相比YOLOv8n提高了3.9个百分点。

关键词: 低照度目标检测, 注意力机制, 轻量化, Transformer, 可变形卷积

CAI Teng, CHEN Cifa, DONG Fangmin. Low-Light Object Detection Combining Transformer and Dynamic Feature Fusion[J]. Computer Engineering and Applications, 2024, 60(9): 135-141.

蔡腾, 陈慈发, 董方敏. 结合Transformer和动态特征融合的低照度目标检测[J]. 计算机工程与应用, 2024, 60(9): 135-141.

References

[1] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[2] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems, 2015.
[3] CAI Z, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[4] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[5] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, October 11-14, 2016. [S.l.]: Springer International Publishing, 2016: 21-37.
[6] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[7] TERVEN J, CORDOVA-ESPARZA D. A comprehensive review of YOLO: from YOLOv1 to YOLOv8 and beyond[J]. arXiv:2304.00501, 2023.
[8] 张蕊, 高诗博, 赵霞, 等. 基于改进YOLOv5s的无人驾驶夜间车辆目标检测算法[J]. 电子测量技术, 2023, 46(17): 87-93.
ZHANG R, GAO S B, ZHAO X, et al. Algorithm on nighttime target detection for unmanned vehicles based on an improved YOLOv5s[J]. Electronic Measurement Technology, 2023, 46(17): 87-93.
[9] KALWAR S, PATEL D, AANEGOLA A, et al. GDIP: gated differentiable image processing for object-detection in adverse conditions[J]. arXiv:2209.14922, 2022.
[10] QIN Q, CHANG K, HUANG M, et al. DENet: detection-driven enhancement network for object detection under adverse weather conditions[C]//Proceedings of the Asian Conference on Computer Vision, 2022: 2813-2829.
[11] REDMON J, FARHADI A. Yolov3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[12] 麦锦文, 李浩, 康雁. 基于特征交互结构的弱光目标检测[J/OL]. 计算机工程与应用: 1-11[2023-10-10]. http://kns.cnki.net/kcms/detail/11.2127.TP.20230403.1553.022.html.
MAI J W, LI H, KANG Y. Low-light object detection based on feature interaction structure[J/OL]. Computer Engineering and Applications: 1-11[2023-10-10]. http://kns.cnki.net/kcms/detail/11.2127.TP.20230403.1553.022.html.
[13] 舒子婷, 张泽斌, 宋尧哲, 等. 基于改进YOLOv5的低光照图像目标检测[J]. 激光与光电子学进展, 2023, 60(4): 77-84.
SHU Z T, ZHANG Z B, SONG Y Z, et al. Low-light image object detection based on improved YOLOv5 algorithm[J]. Laser & Optoelectronics Progress, 2023, 60(4): 77-84.
[14] 陈永麟, 王恒涛, 张上. 基于YOLO v7的轻量级红外目标检测算法[J/OL]. 红外技术: 1-9[2023-10-13]. http://kns.cnki.net/kcms/detail/53.1053.TN.20230911.1613.002.html.
CHEN Y L, WANG H T, ZHANG S. Lightweight infrared target detection algorithm based on YOLO v7[J/OL]. Infrared Technology: 1-9[2023-10-13]. http://kns.cnki.net/kcms/detail/53.1053.TN.20230911.1613.002.html.
[15] HU M, WANG S, LI B, et al. Penet: towards precise and efficient image guided depth completion[C]//2021 IEEE International Conference on Robotics and Automation (ICRA), 2021: 13656-13662.
[16] SASAGAWA Y, NAGAHARA H. Yolo in the dark-domain adaptation method for merging multiple models[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, August 23-28, 2020. [S.l.]: Springer International Publishing, 2020: 345-359.
[17] YIN X, YU Z, GAO X, et al. DEFormer: DCT-driven enhancement transformer for low-light image and dark vision[J]. arXiv:2309.06941, 2023.
[18] ALI M, YIN B, BILAL H, et al. Advanced efficient strategy for detection of dark objects based on spiking network with multi-box detection[J]. Multimedia Tools and Applications, 2023: 1-21.
[19] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[21] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[22] LOH Y P, CHAN C S. Getting to know low-light images with the exclusively dark dataset[J]. Computer Vision and Image Understanding, 2019, 178: 30-42.
[23] LIU W, REN G, YU R, et al. Image-adaptive YOLO for object detection in adverse weather conditions[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 1792-1800.
[24] WANG J, YANG P, LIU Y, et al. Research on improved yolov5 for low-light environment object detection[J]. Electronics, 2023, 12(14): 3089.
[25] GUO C, LI C, GUO J, et al. Zero-reference deep curve estimation for low-light image enhancement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1780-1789.
[26] LIU R, MA L, ZHANG J, et al. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 10561-10570.