ACFEM-RetinaNet Algorithm for Remote Sensing Image Target Detection

doi:10.3778/j.issn.1002-8331.2208-0240

Abstract

Abstract: Aiming at the problem that RetinaNet is difficult to detect multi-scale targets and dense small targets in remote sensing target detection task, an ACFEM-RetinaNet remote sensing target detection algorithm is proposed. To solve the problem that the original backbone network extraction is not sufficient, the algorithm adopts Swin Transformer as the backbone network to improve the feature extraction ability of the algorithm and improve the detection accuracy. For the problem of dense small targets in remote sensing images, an adaptive context feature extraction module is proposed, which uses SK attention to guide deformable convolution with different dilation rates to adaptively adjust the receptive field and extract context features. Aiming at the problem of dense small targets in remote sensing images, the FreeAnchor module is introduced to design and optimize the anchor matching strategy from the perspective of a maximum likelihood estimation (MLE) procedure, so as to improve the detection accuracy. The experimental results show that the ACFEM-RetinaNet algorithm achieves 91.1% detection accuracy on the public remote sensing image target detection dataset RSOD, which is 4.6 percentage points higher than the original algorithm. The ACFEM-RetinaNet can be better applied to remote sensing image target detection.

Key words: deep learning, RetinaNet, , remote sensing target detection, Swin Transformer

摘要： 针对RetinaNet在遥感目标检测任务中多尺度、密集小目标问题，提出了ACFEM-RetinaNet遥感目标检测算法。针对原主干特征提取不充分的问题，采用Swin Transformer作为主干网络，以提升算法的特征提取能力，提高检测精度。针对遥感图像多尺度问题，提出自适应上下文特征提取模块，使用SK注意力引导不同空洞率的可变形卷积自适应调整感受野、提取上下文特征，改善多尺度目标检测效果。针对遥感图像中密集小目标问题，引入FreeAnchor模块，从极大释然估计的角度设计优化锚框匹配策略，提高检测精度。实验结果表明，在公共遥感图像目标检测数据集RSOD上，ACFEM-RetinaNet算法取得了91.1%的检测精度，相较于原算法提高了4.6个百分点，能更好地应用于遥感图像目标检测。

关键词: 深度学习, RetinaNet, 遥感目标检测, Swin Transformer

LIN Wenlong, Alifu·Kuerban, CHEN Yixiao, YUAN Xu. ACFEM-RetinaNet Algorithm for Remote Sensing Image Target Detection[J]. Computer Engineering and Applications, 2024, 60(1): 245-253.

林文龙, 阿里甫·库尔班, 陈一潇, 袁旭. 面向遥感影像目标检测的ACFEM-RetinaNet算法[J]. 计算机工程与应用, 2024, 60(1): 245-253.

References

[1] NIU X. A semi-automatic framework for highway extraction and vehicle detection based on a geometric deformable model[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2006, 61(3/4): 170-186.
[2] PENG J, LIU Y C. Model and context‐driven building extraction in dense urban aerial images[J]. International Journal of Remote Sensing, 2005, 26(7): 1289-1307.
[3] SHI Z, YU X, JIANG Z, et al. Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 52(8): 4511-4523.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[5] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[6] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems, 2015: 91-99.
[7] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[9] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[10] REDMON J, FARHADI A. Yolov3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[11] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[12] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37
[13] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[14] TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9627-9636.
[15] LU X, JI J, XING Z, et al. Attention and feature fusion SSD for remote sensing object detection[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-9.
[16] GUO H, BAI H, YUAN Y, et al. Fully deformable convolutional network for ship detection in remote sensing imagery[J]. Remote Sensing, 2022, 14(8): 1850.
[17] CAO J, CHEN Q, GUO J, et al. Attention-guided context feature pyramid network for object detection[J]. arXiv:2005.11475, 2020.
[18] DONG X, QIN Y, FU R, et al. Multi-scale deformable attention and multi-level features aggregation for remote sensing object detection[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6510405.
[19] YE Y, REN X, ZHU B, et al. An adaptive attention fusion mechanism convolutional network for object detection in remote sensing images[J]. Remote Sensing, 2022, 14(3): 516.
[20] ZHOU L, ZHENG C, YAN H, et al. RepDarkNet: a multi-branched detector for small-target detection in remote sensing images[J]. ISPRS International Journal of Geo-Information, 2022, 11(3): 158.
[21] LI K, WAN G, CHENG G, et al. Object detection in optical remote sensing images: a survey and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 159: 296-307.
[22] TANG L, TANG W, QU X, et al. A scale-aware pyramid network for multi-scale object detection in SAR images[J]. Remote Sensing, 2022, 14(4): 973.
[23] CHALAVADI V, JERIPOTHULA P, DATLA R, et al. mSODANet: a network for multi-scale object detection in aerial images using hierarchical dilated convolutions[J]. Pattern Recognition, 2022, 126: 108548.
[24] XIAO J, GUO H, ZHOU J, et al. Tiny object detection with context enhancement and feature purification[J]. Expert Systems with Applications, 2022: 118665.
[25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[26] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[27] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[28] ZHU X, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.
[29] LI X, WANG W, HU X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 510-519.
[30] ZHANG X, WAN F, LIU C, et al. Freeanchor: learning to match anchors for visual object detection[C]//Advances in Neural Information Processing Systems, 2019.
[31] LONG Y, GONG Y, XIAO Z, et al. Accurate object localization in remote sensing images based on convolutional neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(5): 2486-2498.
[32] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv:1706.05587, 2017.
[33] LIU S, HUANG D. Receptive field block net for accurate and fast object detection[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 385-400.