Multi-Head Attention Detection of Small Targets in Remote Sensing at Multiple Scales

doi:10.3778/j.issn.1002-8331.2210-0366

Abstract

Abstract: For the targets to be detected in complex geospatial remote sensing images, there are problems such as multi-scale characteristics, morphological changes, and too few small target discriminative features, resulting in low detection and recognition accuracy. This paper proposes a multi-scale object detection algorithm for remote sensing small objects based on multi-head attention YOLO-StrVB. First, it reconstructs the network structure, builds a multi-scale network model, adds a target detection layer, and improves the detection ability of the remote sensing small target model under the feature extraction network at different scales. Then, a bidirectional feature pyramid network（Bi-FPN） is added for multi-scale feature fusion to improve bidirectional cross-scale connections and weighted feature fusion. Secondly, the multi-head attention mechanism block of Swin Transformer is integrated at the end of the YOLOv5 network to improve the multi-scale fusion relationship of the receptive field to adapt to the target recognition task, and optimize the backbone network. Finally, it uses Varifocal Loss to train the network to improve the confidence and positioning accuracy of remote sensing dense detection small targets, and selects CIOU as the loss function of the border frame regression to improve the backing accuracy of the frame of perception classification（IACS）. Through experimental verification on the remote sensing target dataset NWPU VHR-10, the mAP compared with the original YOLOv5 model is increased by 3.05 percentage points, which can effectively improve the detection accuracy of small targets and achieve the robustness of small target detection in geospatial remote sensing images.

Key words: YOLOv5, remote sensing, small object detection, Swin Transformer, multi-scale feature fusion

摘要： 针对地理空间遥感图像中检测目标存在多尺度特性、形态多变以及小目标判别特征过少等造成检测识别精度不高的问题，提出了基于多尺度下遥感小目标多头注意力检测算法YOLO-StrVB。对网络结构进行重构，搭建多尺度网络模型，增加目标检测层，提高特征提取网络下遥感小目标模型不同尺度下的检测能力；加入双向特征金字塔网络（Bi-FPN）进行多尺度特征融合，提高双向跨尺度连接和加权特征融合；在YOLOv5网络末端融合Swin Transformer多头注意力机制块，提升感受野适应目标识别任务的多尺度融合关系，优化主干网络；使用Varifocal loss对网络进行训练，提升遥感密集检测小目标的存在置信度和定位精度，并选用CIoU作为边界框回归的损失函数，提高感知分类得分（IACS）的边框回归精度。通过在遥感目标数据集NWPU VHR-10上的实验验证，对比YOLOv5原模型的mAP提高了3.05个百分点，能有效提高小目标的检测精度，达到了对地理空间遥感图像中小目标检测的鲁棒性。

关键词: YOLOv5, 遥感, 小目标检测, Swin Transformer, 多尺度特征融合

ZHANG Zhaoyang, ZHANG Shang, WANG Hengtao, RAN Xiukang. Multi-Head Attention Detection of Small Targets in Remote Sensing at Multiple Scales[J]. Computer Engineering and Applications, 2023, 59(8): 227-238.

张朝阳, 张上, 王恒涛, 冉秀康. 多尺度下遥感小目标多头注意力检测[J]. 计算机工程与应用, 2023, 59(8): 227-238.

References

[1] 廖育荣，王海宁，林存宝，等.基于深度学习的光学遥感图像目标检测研究进展[J].通信学报，2022，43（5）：190-203.
LIAO Yurong，WANG Haining，LIN Cunbao，et al.Research progress of deep learning-based object detection of optical remote sensing image[J].Journal on Communications，2022，43（5）：190-203.
[2] KRIZHEVSKY A，SUTSKEVER I，HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems，2012.
[3] GIRSHICK R，DONAHUE J，DARRELL T，et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2014：580-587.
[4] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2015：1440-1448.
[5] HE K，GKIOXARI G，DOLLáR P，et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision，2017：2961-2969.
[6] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：779-788.
[7] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//European Conference on Computer Vision.Cham：Springer，2016：21-37.
[8] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：7263-7271.
[9] REDMON J，FARHADI A.Yolov3：an incremental improvement[J].arXiv：1804.02767，2018.
[10] BOCHKOVSKIY A，WANG C Y，LIAO H Y M.Yolov4：optimal speed and accuracy of object detection[J].arXiv：2004.10934，2020.
[11] QU J S，SU C，ZHANG Z W，et al.Dilated convolution and feature fusion SSD network for small object detection in remote sensing images[J].IEEE Access，2020，8：82832-82843.
[12] JIANG S，YAO W，WONG M S，et al.An optimized deep neural network detecting small and narrow rectangular objects in Google Earth Images[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing，2020，13：1068-1081.
[13] 闫钧华，张琨，施天俊，等.融合多层级特征的遥感图像地面弱小目标检测[J].仪器仪表学报，2022，43（3）：221-229.
YAN Junhua，ZHANG Kun，SHI Tianjun，et al.Multi-level feature fusion based dim small ground target detection in remote sensing images[J].Chinese Journal of Scientific Instrument，2022，43（3）：221-229.
[14] 张云佐，郭威，李文博.遥感图像密集小目标全方位精准检测算法[J/OL].吉林大学学报（工学版）：1-9[2022-11-12].DOI：10.13229/j.cnki.jdxbgxb20220715.
ZHANG Yunzuo，GUO Wei，LI Wenbo.Omnidirectional accurate detection algorithm for dense small objects in remote sensing images[J/OL].Journal of Jilin University（Engineering and Technology Edition）：1-9[2022-11-12].DOI：10.13229/j.cnki.jdxbgxb20220715.
[15] 张寅，朱桂熠，施天俊，等.基于特征融合与注意力的遥感图像小目标检测[J].光学学报，2022，42（24）：132-142.
ZHANG Yin，ZHU Guiyi，SHI Tianjun，et al.Small object detection in remote sensing images based on feature fusion and attention[J].Acta Optica Sinica，2022，42（24）：132-142.
[16] 王恒涛，张上.轻量化SAR图像舰船目标检测算法[J/OL].电光与控制：1-9[2022-11-12].http：//kns.cnki.net/kcms/detail/41.1227.tn.20220715.0919.002.html.
WANG Hengtao，ZHANG Shang.Lightweight target detection algorithm based on SAR ship image[J/OL].Electronics Optics and Control：1-9[2022-11-12].http：//kns.cnki.net/kcms/detail/41.1227.tn.20220715.0919.002.html.
[17] LIU Z，LIN Y，CAO Y，et al.Swin transformer：hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2021：10012-10022.
[18] TAN M，PANG R，LE Q V.Efficientdet：scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：10781-10790.
[19] ZHANG H，WANG Y，DAYOUB F，et al.Varifocalnet：an IoU-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：8514-8523.
[20] ZHENG Z，WANG P，LIU W，et al.Distance-IoU loss：faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020，34（7）：12993-13000.
[21] CHENG G，HAN J，ZHOU P，et al.Multi-class geospatial object detection and geographic image classification based on collection of part detectors[J].ISPRS Journal of Photogrammetry and Remote Sensing，2014，98：119-132.
[22] WANG C Y，LIAO H Y M，WU Y H，et al.CSPNet：a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops，2020：390-391.
[23] HE K，ZHANG X，REN S，et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，37（9）：1904-1916.
[24] LIU S，QI L，QIN H，et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2018：8759-8768.
[25] LIN T Y，DOLLáR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：2117-2125.
[26] REZATOFIGHI H，TSOI N，GWAK J Y，et al.Generalized intersection over union：a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：658-666.
[27] NEUBECK A，VAN GOOL L.Efficient non-maximum suppression[C]//18th International Conference on Pattern Recognition（ICPR’06），2006：850-855.
[28] HENDRYCKS D，GIMPEL K.Gaussian error linear units（gelus）[J].arXiv：1606.08415，2016.
[29] MAAS A L，HANNUN A Y，NG A Y.Rectifier nonlinearities improve neural network acoustic models[C]//Proc ICMl，2013.
[30] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017.
[31] WANG C，BAI X，WANG S，et al.Multiscale visual attention networks for object detection in VHR remote sensing images[J].IEEE Geoscience and Remote Sensing Letters，2019，16（2）：310-314.