计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (11): 224-232.DOI: 10.3778/j.issn.1002-8331.2302-0051

• 图形图像处理 • 上一篇    下一篇

基于特征交互结构的弱光目标检测

麦锦文,李浩,康雁   

  1. 1.云南大学 信息学院,昆明 650504
    2.云南省智慧旅游工程研究中心,昆明 650504
    3.云南大学 软件学院,昆明 650504
  • 出版日期:2024-06-01 发布日期:2024-05-31

Low-Light Object Detection Based on Feature Interaction Structure

MAI Jinwen, LI Hao, KANG Yan   

  1. 1.School of Information Science & Engineering, Yunnan University, Kunming 650504, China
    2.Yunnan Engineering Research Center of Smart Tourism, Kunming 650504, China
    3.School of Software, Yunnan University, Kunming 650504, China
  • Online:2024-06-01 Published:2024-05-31

摘要: 针对当前主流、先进的目标检测算法在弱光场景下对目标检测精度较低的问题,分析弱光图像削弱了传统卷积神经网络依赖的局部相关性归纳偏置,引入对全局特征有着出色建模能力的Swin Transformer stage以实现全局注意,增强特征信息量。将全局注意以并行方式与局部卷积共同抽取弱光图像特征,并提出了一种特征交互结构(feature interaction structure,FIS),通过精心设计的二次交互方式,能有效解析、利用和结合局部与全局信息。基于FIS堆叠构造交互式并行双流骨干网络FISNet,实现对两类特征的深度融合,并提供对密集预测型任务十分重要的层级特征结构。FISNet在弱光图像数据集ExDark上达到了40.6?AP,与EfficientNet等基准模型相比,得到了+0.5~2.9?AP的检测精度提升,在弱光目标检测场景中具有良好的应用。

关键词: 弱光图像, 目标检测, 全局特征, 特征交互结构

Abstract: Aiming at the problem that the current mainstream and advanced object detection algorithms have low detection accuracy in low-light scenes, it is analyzed that the low-light image weakens the local correlation induction bias that the traditional convolutional neural network relies on, and introduces the Swin Transformer stage with excellent modeling ability for global features to achieve global attention and enhance the amount of feature information. The global attention is combined with local convolution to extract the features of low-light image in parallel, and a feature interaction structure (FIS) is proposed. Through the carefully designed secondary interaction mode, local and global information can be effectively analyzed, utilized and combined. The interactive parallel dual-stream backbone network FISNet is constructed based on the FIS stack, which realizes the deep fusion of the two types of features, and provides a hierarchical feature structure that is very important for intensive predictive tasks. FISNet has achieved 40.6 AP on the low-light image data set ExDark, and has achieved +0.5~2.9 AP detection accuracy improvement compared with the benchmark model such as EfficientNet, which has good application in low-light object detection scenarios.

Key words: low-light images, object detection, global features, feature interaction structure