计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (10): 247-257.DOI: 10.3778/j.issn.1002-8331.2401-0262

• 图形图像处理 • 上一篇    下一篇

ResFPN:扩增实际感受野和改进FPN的多尺度目标检测方法

杨扬,唐晓芬   

  1. 1.宁夏大学 信息工程学院,银川 750021
    2.宁夏“东数西算”人工智能与信息安全重点实验室,银川 750021
  • 出版日期:2025-05-15 发布日期:2025-05-15

ResFPN: Multi-Scale Object Detection Algorithm for Expanding Actual Receptive Field and Improving FPN

YANG Yang, TANG Xiaofen   

  1. 1.School of Information Engineering, Ningxia University, Yinchuan 750021, China
    2.Ningxia Key Laboratory of Artificial Intelligence and Information Security for Channeling Computing Resources from the East to the West, Yinchuan 750021, China
  • Online:2025-05-15 Published:2025-05-15

摘要: 针对多尺度目标检测中主干网络实际感受野远远小于理论感受野,感受野分布稀疏,以及特征金字塔网络(feature pyramid network,FPN)在横向连接过程中统一通道数会丢失通道信息等影响模型性能的问题,提出一种扩增实际感受野和多特征融合改进FPN的多尺度目标检测算法ResFPN。针对主干网络实际感受野远远小于理论感受野的问题,设计了多分支膨胀卷积(multi-branch dilated convolutional,MBD)模块和多分支池化(multi-branch pooling,MBP)模块,通过学习不同尺度空间特征融合,扩增感受野。针对感受野分布稀疏问题,提出轻量级通道交互融合(channel interactive fusion,CIF)模块,通过双分支结构并在每一分支叠加不同数量深度可分离卷积学习像素间的依赖关系增强特征表示。针对FPN通过1×1卷积统一通道数会丢失通道信息的问题,尝试利用SubPixel卷积提取C5层输出特征,保留原始丰富语义信息的同时引出额外双向路径对FPN通道信息进行补充,但这可能会产生冗余信息。因此,在额外双向路径后引入全局上下文(global context,GC)模块,利用GC瓶颈转换模块进一步融合特征信息,减少信息冗余。实验表明,提出的ResFPN有效解决了感受野分布稀疏问题,并将主干网络感受野增大为原来的一倍,同时提出的改进FPN通道丢失问题的方法也在多尺度目标检测中获得了良好的性能。与典型的网络Faster R-CNN相比,大、中、小物体检测平均精度在具有挑战性的MS COCO数据集上分别提高了2.2、1.6、2.0个百分点,与其他检测器相比检测效果也有提升。

关键词: 目标检测, 卷积神经网络, 多尺度目标检测, 感受野, 特征金字塔网络(FPN)

Abstract: In view of the problems that the actual receptive field of the backbone network is much smaller than the theoretical receptive field, the sparse receptive field distribution, and the unified channel number of the feature pyramid network (FPN) in the horizontal connection process affect the performance of the model, ResFPN is proposed, which is a multi-scale object detection algorithm that expands the actual receptive field and improves the FPN by multi-feature fusion. In view of the fact that the actual receptive field of the backbone network is much smaller than the theoretical receptive field, a multi-branch dilated convolutional (MBD) module and a multi-branch pooling (MBP) module are designed to expand the receptive field by learning different scale spatial fusion. To solve the problem of sparse receptive field distribution, a lightweight channel interactive fusion (CIF) module is proposed, and the feature representation is enhanced by a two-branch structure, and the dependency relationship between separable convolution learning pixels is superimposed on each branch with different number of depths. In order to solve the problem that FPN will lose channel information through the unified channel number of 1×1 convolution, SubPixel convolution is tried to extract C5 layer features, retain the original rich semantic information and induce additional bidirectional paths to supplement the FPN channel information, but this may produce redundant information. Therefore, the global context (GC) module is introduced after the additional bidirectional path, and the GC bottleneck conversion module is used to further fuse the feature information and reduce the information redundancy. Experiments show that the proposed ResFPN effectively solves the problem of sparse receptive field distribution, and doubles the receptive field of the backbone network. Meanwhile, the proposed method to improve the FPN channel loss problem also achieves good performance in multi-scale object detection. Compared with the typical network Faster R-CNN, the average accuracy of large, medium and small object detection on the challenging MS COCO dataset is improved by 2.2, 1.6 and 2.0 percentage points, respectively, and the detection effect is also improved compared with other detectors.

Key words: object detection, convolutional neural network, multi-scale object detection, receptive field, feature pyramid network (FPN)