计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (9): 135-141.DOI: 10.3778/j.issn.1002-8331.2310-0131

• YOLOv8 改进及应用专题 • 上一篇    下一篇

结合Transformer和动态特征融合的低照度目标检测

蔡腾,陈慈发,董方敏   

  1. 1.三峡大学 计算机与信息学院,湖北 宜昌 443002
    2.三峡大学 湖北省建筑质量检测装备工程技术研究中心,湖北 宜昌 443002
  • 出版日期:2024-05-01 发布日期:2024-04-29

Low-Light Object Detection Combining Transformer and Dynamic Feature Fusion

CAI Teng, CHEN Cifa, DONG Fangmin   

  1. 1.College of Computer and Information, China Three Gorges University, Yichang, Hubei 443002, China
    2.Hubei Province Engineering Technology Research Center for Construction Quality Testing Equipment, China Three Gorges University, Yichang, Hubei 443002, China
  • Online:2024-05-01 Published:2024-04-29

摘要: 针对现有低照度目标检测算法模型参数量与计算量大、检测实时性差、难以应用于移动设备等问题,提出一种基于YOLOv8的改进轻量模型DarkYOLOv8的低照度目标检测方法。使用MobileNet v2替换YOLOv8的主干网络,增加模型的特征提取能力;使用Transformer注意力机制,获取图像的全局信息,并且基于目标标记信息作为标签训练Transformer模块参数,增强目标区域内的权重,从而提高模型在低照度条件下提取目标特征的能力;对颈部网络使用动态特征融合注意力模块(dynamic feature fusion attention,DFFA),动态融合浅层和深层特征,同时使用YOLOv8X算法+CBAM对DFFA模块中CBAM空间注意力权重进行监督训练。实验结果表明,在ExDark数据集上,DarkYOLOv8在GFLOPs仅为8.53的情况下mAP50指标达到70.1%,相比YOLOv8n提高了3.9个百分点。

关键词: 低照度目标检测, 注意力机制, 轻量化, Transformer, 可变形卷积

Abstract: To address the issues of high parameter and computational complexity, poor real-time performance, and limited applicability to mobile devices in existing low-light object detection algorithms, this paper proposes an improved lightweight model called DarkYOLOv8 based on YOLOv8 for low-light object detection. Firstly, MobileNet v2 is replaced the backbone network of YOLOv8 to enhance the  feature extraction capabilities of the model. Secondly, the Transformer attention mechanism is utilized to capture global information from the images and the Transformer module parameters are trained based on target annotation information as labels to enhance the weights within the target regions, thereby improving the capability of the model to extract target features under low-light conditions. Finally, the dynamic feature fusion attention (DFFA) module is employed for feature fusion in the neck network, dynamically fusing shallow and deep features, simultaneously, the YOLOv8X algorithm is employed in combination with CBAM to supervise the training of spatial attention weights in the CBAM module of DFFA. The experimental results show that on the ExDark dataset, DarkYOLOv8 achieves 70.1% on the mAP50 metric with only 8.53 GFLOPs, which is a 3.9 percentage points improvement compared to YOLOv8n.

Key words: low-light object detection, attention mechanism, lightweight, Transformer, deformable convolution