无锚框目标检测模型特征任务不对齐研究

doi:10.3778/j.issn.1002-8331.2202-0260

摘要/Abstract

摘要： 通常的目标检测模型由分类任务和回归任务构成。由于不同的任务驱动因素，模型中头部对应的这两个任务分支网络对来自同一输入图片、同一个实例的特征具有不同的敏感性。这就造成了检测模型对于相同位置的特征、分类效果和回归效果相差巨大的问题，也就是任务特征不对齐的问题。但是通用的目标检测后处理办法，仅以分类分数作为非极大抑制过程的标准，带来了大量回归质量较差、但置信度很高的检测结果。对现代化的无锚框网络展开不对齐问题的研究分析，将问题进一步拆解为尺度层级上的不对齐和空间位置上的不对齐。提出了参数量代价最小的解决方案：使用可变形卷积模块对检测模型头部网络的感受野进行微调，使用考虑样本点对齐效果的标签分配机制进行对齐样本点的挖掘，创新性地解决了上述两个子问题。进一步的详细实验和对比分析证明了该工作的有效性和实用性，以及对不同特征提取骨干网络的鲁棒性。

关键词: 目标检测, 深度学习, 无锚框检测器, 标签分配机制

Abstract: General object detection models consist of classification and regression branches. Due to different task drivers, they have a different sensibility to the features from the exact instances. That causes a vast performance gap, the so-called task-feature misalignment problem. Based on the assumption that the candidate result with high classification confidence has a high regression quality, the standard prediction method employs only the classification score as the criterion in NMS procedures. That leads to many prediction results with high classification scores but poor regression qualities. This paper mainly researches the misalignment problem in modern anchor-free detection models, specifically decomposing the problem with scale and spatial misalignment. It proposes to resolve the problem at minimal cost-a minor modification of the head network, which tweaks the receptive field of two tasks individually, and a new label assignment method mining the most aligned feature samples. The experiments show that, compared to the baseline FCOS, a one-stage anchor-free object detection model, the model consistently gets around 3 AP improvements with different backbones, demonstrating the method’s simplicity and efficiency.

Key words: object detection, deep learning, anchor-free models, label assignment scheme

郝帅征, 刘宏哲. 无锚框目标检测模型特征任务不对齐研究[J]. 计算机工程与应用, 2023, 59(11): 151-159.

HAO Shuaizheng, LIU Hongzhe. Research on Feature Misalignment Between Tasks in Anchor-Free Models[J]. Computer Engineering and Applications, 2023, 59(11): 151-159.

参考文献

[1] 罗会兰，陈鸿坤.基于深度学习的目标检测研究综述[J].电子学报，2020，48（6）：1230-1239.
LUO H L，CHEN H K.Survey of object detection based on deep learning[J].Acta Electronica Sinica，2020，48（6）：1230-1239.
[2] ZHANG S，CHI C，YAO Y，et al.Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：9759-9768.
[3] SONG G，LIU Y，WANG X.Revisiting the sibling head in object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：11563-11572.
[4] CAI Z，VASCONCELOS N.Cascade R-CNN：delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：6154-6162.
[5] JIANG B，LUO R，MAO J，et al.Acquisition of localization confidence for accurate object detection[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：784-799.
[6] WANG K，LIEW J H，ZOU Y，et al.PaNet：few-shot image semantic segmentation with prototype alignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9197-9206.
[7] ZHU L，DENG Z，HU X，et al.Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：121-136.
[8] ZHANG H，WANG Y，DAYOUB F，et al.VarifocalNet：an IoU-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：8514-8523.
[9] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[10] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyra-mid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[11] LIU S，HUANG D，WANG Y.Learning spatial fusion for single-shot object detection[J].arXiv：1911.09516，2019.
[12] QIAO S，CHEN L C，YUILLE A.Detectors：detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：10213-10224.
[13] CHEN L C，PAPANDREOU G，KOKKINOS I，et al.Deeplab：semantic image segmentation with deep convolutional nets，atrous convolution，and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，40（4）：834-848.
[14] GHIASI G，LIN T Y，LE Q V.NAS-FPN：learning scalable feature pyramid architecture for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：7036-7045.
[15] REN S，HE K，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems，2015.
[16] WU S，LI X，WANG X.IoU-aware single-stage object detector for accurate localization[J].Image and Vision Computing，2020，97：103911.
[17] KIM K，LEE H S.Probabilistic anchor assignment with IoU prediction for object detection[C]//European Conference on Computer Vision.Cham：Springer，2020：355-371.
[18] TIAN Z，SHEN C，CHEN H，et al.FCOS：fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9627-9636.
[19] GE Z，LIU S，LI Z，et al.OTA：optimal transport assignment for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：303-312.
[20] ZHU C，HE Y，SAVVIDES M.Feature selective anchor-free module for single-shot object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：840-849.
[21] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2980-2988.
[22] REZATOFIGHI H，TSOI N，GWAK J Y，et al.Generalized intersection over union：a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：658-666.
[23] ZHANG X，WAN F，LIU C，et al.FreeAnchor：learning to match anchors for visual object detection[C]//Advances in Neural Information Processing Systems，2019.
[24] ZHU C，CHEN F，SHEN Z，et al.Soft anchor-point object detection[C]//European Conference on Computer Vision.Cham：Springer，2020：91-107.
[25] KE W，ZHANG T，HUANG Z，et al.Multiple anchor learning for visual object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10206-10215.
[26] LI X，WANG W，WU L，et al.Generalized focal loss：learning qualified and distributed bounding boxes for dense object detection[C]//Advances in Neural Information Processing Systems，2020：21002-21012.
[27] MA Y，LIU S，LI Z，et al.IQDet：instance-wise quality distribution sampling for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：1717-1725.
[28] YANG Z，LIU S，HU H，et al.RepPoints：point set representation for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：9657-9666.
[29] ZHOU X，KOLTUN V，KR?HENBüHL P.Probabilistic two-stage detection[J].arXiv：2103.07461，2021.
[30] LAW H，DENG J.CornerNet：detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision（ECCV），2018：734-750.
[31] ZHOU X，WANG D，KR?HENBüHL P.Objects as points[J].arXiv：1904.07850，2019.