面向小目标检测的并行高分辨率网络设计

doi:10.3778/j.issn.1002-8331.2203-0577

摘要/Abstract

摘要： 当前目标检测算法对小目标检测存在特征信息易丢失的问题，利用网络处理高分辨率特征图数据可以缓解，但存在语义信息不足和计算负担大的缺点。为弥补这些缺点，提出一种有效处理高分辨率特征图、多深度子网并行连接的特征提取网络。构建输入图像金字塔，搭建多深度分支子网并行连接的结构，使用浅层网络处理图像金字塔中高分辨率特征图，深层网络处理低分辨率特征图，多分支同时运行并在中间位置进行两次特征融合，充分结合高分辨率特征信息和低分辨率语义信息；使用融合因子构建对小目标针对性强的多尺度特征融合结构，增强对小目标检测能力；使用注意力机制进一步提高特征提取能力。在公开数据集AI-TOD上进行实验表明，所设计的特征提取网络相较于其他常用特征提取网络对小目标的检测能力更强，在two-stage经典模型Faster-RCNN、one-stage经典模型SSD、YOLOv3以及anchor-free经典模型CenterNet上替换上原主干网络，检测平均精度mAP与原来相比分别提升了2.7、3.4、3.3、1.7个百分点，证明了所提网络结构的适用性和有效性。

关键词: 小目标检测, 特征提取, 多尺度, 上下文信息, 注意力机制

Abstract: The current object detection algorithm has the problem of easy loss of feature information for small object detection, which can be alleviated by using the network to process high-resolution feature map data, but it has the shortcomings of insufficient semantic information and large computational burden. To remedy these shortcomings, this paper proposes a feature extraction network that effectively handles high-resolution feature maps and parallel connections of multiple depth subnetworks. This paper constructs an input image pyramid, builds a parallel connection structure of multi-depth branch subnets, uses a shallow network to process high-resolution feature maps in the image pyramid, and uses a deep network to process low-resolution feature maps, multi-branch runs at the same time and performs two feature fusions in the middle position, fully combining high-resolution feature information and low-resolution semantic information. Using the fusion factor to build a multi-scale feature fusion structure that is highly targeted to small targets to enhance the detection ability of small targets. Using attention mechanism to further improve feature extraction ability. Experiments on the public dataset AI-TOD show that the designed feature extraction network has stronger detection ability for small targets than other commonly used feature extraction networks. Replacing the original backbone network on the two-stage classic model Faster-RCNN, the one-stage classic model SSD, YOLOv3 and the anchor-free classic model CenterNet. Compared with the original, the average detection accuracy of mAP is increased by 2.7, 3.4, 3.3 and 1.7 percentage points respectively, which proves the applicability and effectiveness of the proposed network structure.

Key words: small object detection, feature extraction, multi-scale, contextual information, attention mechanism

牛润, 曲毅, 郑乐辉, 魏建国. 面向小目标检测的并行高分辨率网络设计[J]. 计算机工程与应用, 2022, 58(18): 172-179.

NIU Run, QU Yi, ZHENG Lehui, WEI Jianguo. Parallel High-Resolution Network Design for Small Object Detection[J]. Computer Engineering and Applications, 2022, 58(18): 172-179.

参考文献

[1] 刘洪江，王懋，刘丽华，等.基于深度学习的小目标检测综述[J].计算机工程与科学，2021，43（8）：1429-1442.
LIU H J，WANG M，LIU L H，et al.A survey of small object detection based on deep learning[J].Computer Engineering and Science，2021，43（8）：1429-1442.
[2] 李科岑，王晓强，林浩，等.深度学习中的单阶段小目标检测方法综述[J].计算机科学与探索，2022，16（1）：41-58.
LI K C，WANG X Q，LIN H，et al.Survey of one-stage small object detection methods in deep learning[J].Journal of Frontiers of Computer Science and Technology，2022，16（1）：41-58.
[3] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[4] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[5] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：8759-8768.
[6] FU C Y，LIU W，RANGA A，et al.DSSD：deconvolutional single shot detector[J].arXiv：1701.06659，2017.
[7] 李青援，邓赵红，罗晓清，等.注意力与跨尺度融合的SSD目标检测算法[J].计算机科学与探索：1-14[2022-03-29].http：//kns.cnki.net/kcms/detail/11.5602.TP.20210323.1748.
013.html.
LI Q Y，DENG Z H，LUO X Q，et al.SSD object detection algorithm with attention and cross-scale fusion[J/OL].Journal of Frontiers of Computer Science and Technology：1-14[2022-03-29].http：//kns.cnki.net/kcms/detail/11.5602.TP.
20210323.1748.013.html.
[8] 梁延禹，李金宝.多尺度非局部注意力网络的小目标检测算法[J].计算机科学与探索，2020，14（10）：1744-1753.
LIANG Y Y，LI J B.Small objects detection method based on multi-scale non-local attention network[J].Journal of Frontiers of Computer Science and Technology，2020，14（10）：1744-1753.
[9] YU F，KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv：1511.07122，2015.
[10] LI Y，CHEN Y，WANG N，et al.Scale-aware trident networks for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：6054-6063.
[11] SUN K， XIAO B，LIU D，et al.Deep high-resolution representation learning for human pose estimation[J].arXiv：1902.09212，2019.
[12] LIU Z，GAO G，SUN L，et al.HRDNet：high-resolution detection network for small objects[C]//2021 IEEE International Conference on Multimedia and Expo（ICME），2021.
[13] SHI W，CABALLERO J，HUSZáR F，et al.Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：1874-1883.
[14] LIU S，HUANG D.Receptive field block net for accurate and fast object detection[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：385-400.
[15] GONG Y，YU X，DING Y，et al.Effective fusion factor in FPN for tiny object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision，2021：1160-1168.
[16] HU J，LI S，ALBANIE S，et al.Squeeze-and-excitation networks[J].arXiv：1709.01507，2017.
[17] ZHU X，CHENG D，ZHANG Z，et al.An empirical study of spatial attention mechanisms in deep networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：6688-6697.
[18] PARK J，WOO S，LEE J Y，et al.Bam：bottleneck attention module[J].arXiv：1807.06514，2018.
[19] WOO S，PARK J，LEE J Y，et al.Cbam：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：3-19.
[20] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[J].arXiv：1706.03762，2017.
[21] DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An image is worth 16×16 words：transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[22] CARION N，MASSA F，SYNNAEVE G，et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham：Springer，2020：213-229.
[23] WANG J，YANG W，GUO H，et al.Tiny object detection in aerial images[C]//25th International Conference on Pattern Recognition（ICPR），2021：3791-3798.
[24] LI X，WANG W，WU L，et al.Generalized focal loss：learning qualified and distributed bounding boxes for dense object detection[J].Advances in Neural Information Processing Systems，2020，33：21002-21012.
[25] HE K，ZHANG X，REN S，et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[26] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.Yolov4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[27] XIE S，GIRSHICK R，DOLLáR P，et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：1492-1500.
[28] DING X，ZHANG X，MA N，et al.Repvgg：making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：13733-13742.
[29] YU F，WANG D，SHELHAMER E，et al.Deep layer aggregation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：2403-2412.
[30] HUANG G，LIU Z，VAN DER MAATEN L，et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：4700-4708.
[31] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[32] ZHOU X，WANG D，KR?HENBüHL P.Objects as points[J].arXiv：1904.07850，2019.