计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (24): 177-187.DOI: 10.3778/j.issn.1002-8331.2406-0424

• 图形图像处理 • 上一篇    下一篇

LOL-YOLO:融合多注意力机制的低照度目标检测

蒋畅江,何旭颖,向杰   

  1. 1.重庆邮电大学 自动化学院/工业互联网学院,重庆 400065
    2.工业物联网与网络化控制教育部重点实验室,重庆 400065
  • 出版日期:2024-12-15 发布日期:2024-12-12

LOL-YOLO:Low-Light Object Detection Incorporating Multiple Attention Mechanisms

JIANG Changjiang, HE Xuying, XIANG Jie   

  1. 1.School of Automation/Industrial Internet of Things, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
    2.Key Laboratory of Industrial Internet of Things and Networked Control, Ministry of Education, Chongqing 400065, China
  • Online:2024-12-15 Published:2024-12-12

摘要: 针对低照度图像中目标检测面临的夜间模糊场景、边界不明显场景、明暗差异较大场景等挑战,提出了一种动态特征融合的的检测方法LOL-YOLO(low-light YOLO)。引入了自校正照明模块改善低光照图片的质量,应对低照度下的目标不明显问题;提出了动态特征提取模块,采用结合了大卷积核和可变形卷积的注意力机制,广泛灵活的捕捉图像的上下文信息;设计动态检测头增强对不同尺度、空间位置和任务的感知能力,进一步提升目标检测的精度和鲁棒性。采用ExDark、DarkFace、NPD(nighttime pedestrian detection)数据集进行实验验证,实验结果表明,提出的方法与主流算法相比检测精度明显提升,充分验证了该方法的有效性。

关键词: 低照度, 图像增强, 大卷积核, 可变形卷积, 多重注意力机制

Abstract: Addressing the challenges in low-illumination target detection, such as blurry night scenes, indistinct boundaries, and pronounced brightness disparities, this paper introduces LOL-YOLO (low-light YOLO), a detection method based on dynamic feature fusion. A self-correcting illumination module is incorporated to enhance low-light image quality and counteract target obscurity under low illumination. A dynamic feature extraction module is proposed, which leverages an attention mechanism combining large convolutional kernels with deformable convolutions, enabling extensive and agile contextual information capture. Finally, a dynamic detection head is devised to augment perception of varying scales, spatial positions, and tasks, thereby refining detection accuracy and robustness. Experimental validation using the ExDark, DarkFace, and NPD (nighttime pedestrian detection) datasets demonstrate significant accuracy improvements over prevalent algorithms, confirming the effectiveness of the proposed method.

Key words: low-light, image enhancement, large convolutional kernels, deformable convolutions, multiple attention mechanisms