Aerial Image Object Detection with Feature Enhancement Using Hybrid Attention

doi:10.3778/j.issn.1002-8331.2209-0206

Abstract

Abstract: Aiming at the characteristics of complex background, dense distribution and large scale variation in aerial images, this paper proposes a novel object detection framework named as hybrid attention network (HA-Net). Firstly, Transformer structure both with local and global attention in the backbone network is designed to enhance dense targets feature extraction ability. The Transformer structure uses attention to suppress background noises and make dense target boundaries clearer. Then, a spatial pyramid pooling block using continuous AvgPooling and MaxPooling is adopted to enrich feature information and enhance the multi-scale target representation. Moreover, a feature reconstruction module mixing cross-scale spatial attention and non-local channel attention is designed to reconstruct the feature pyramid network, so as to reduce unnecessary information interference and facilitate multi-scale target detection. The network is evaluated on a large remote sensing dataset DOTA, and the evaluation mAP reaches 76.81% and 78.28% on single-scale test and multi-scale test respectively, which surpasses the baseline model by a large margin of 2.38 percentage points and 3.62 percentage points. The evaluation mAP reaches 89.95% on HRSC2016. The improvement of detection results proves the effectiveness of HA-Net in aerial image object detection.

Key words: aerial images, rotation object detection, Transformer, attention mechanism

摘要： 针对航空图像背景复杂、目标分布密集、尺度差异大等特点，提出一种新的航空图像检测网络，称为混合注意力网络（hybrid attention network, HA-Net）。在主干网络中设计同时兼顾局部注意力和全局注意力的Transformer结构，利用注意力消除背景噪音，使密集目标边界更加清晰，提升密集目标特征提取能力；在特征融合前，提出使用连续平均池化和最大池化的空间金字塔池化模块来丰富图像特征信息，增强不同尺度目标的表示能力；在特征融合时设计特征重构模块重新调整特征金字塔的特征信息，此模块混合了跨尺度空间注意力和非局部通道注意力，可以减少不必要信息的干扰，提升多尺度目标的检出率。在DOTA航空数据集上对HA-Net进行评估，在单尺度和多尺度测试上评估指标 mAP分别达到77.04%和78.28%，较基准网络，mAP分别提升了2.38个百分点和3.62个百分点。在HRSC2016数据集上mAP达到89.95%。实验结果的提升证明了HA-Net在航空图像目标检测中的有效性。

关键词: 航空图像, 旋转目标检测, Transformer, 注意力机制

GUAN Wenqing, ZHOU Shibin, ZHANG Guopeng. Aerial Image Object Detection with Feature Enhancement Using Hybrid Attention[J]. Computer Engineering and Applications, 2024, 60(4): 249-257.

管文青, 周世斌, 张国鹏. 混合注意力特征增强的航空图像目标检测[J]. 计算机工程与应用, 2024, 60(4): 249-257.

References

[1] 朱煜, 方观寿, 郑兵兵, 等. 基于旋转框精细定位的遥感目标检测方法研究[J]. 自动化学报, 2023, 49(2): 415-424.
ZHU Y, FANG G Y, ZHENG B B, et al. Research on detection method of refined rotated boxes in remote sensing[J].Acta Automatica Sinica, 2023, 49(2): 415-424.
[2] 王道累, 杜文斌, 刘易腾, 等. 基于密集连接与特征增强的遥感图像检测[J]. 计算机工程, 2022, 48(6): 251-256.
WANG D L, DU W B, LIU Y T, et al. Remote sensing images detection based on dense connection and feature enhancement[J].Computer Engineering, 2022, 48(6): 251-256.
[3] 谢俊章, 彭辉, 唐健峰, 等. 改进YOLOv4的密集遥感目标检测[J]. 计算机工程与应用, 2021, 57(22): 247-256.
XIE J Z, PENG H, TANG J F, et al. Improved YOLOv4 for dense remote sensing target detection[J]. Computer Engineering and Applications, 2021, 57(22): 247-256.
[4] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[5] HE K, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 2980-2988.
[6] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[7] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37.
[8] JIANG Y, ZHU X, WANG X, et al. R2CNN: rotational region CNN for orientation robust scene text detection[J]. arXiv:1706.09579, 2017.
[9] DING J, XUE N, LONG Y, et al. Learning RoI transformer for oriented object detection in aerial images[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2844-2853.
[10] YANG X, YAN J, HE T. On the arbitrary-oriented object detection: classification based approaches revisited[J]. International Journal of Computer Vision, 2020, 130(5): 1340-1365.
[11] LI C, XU C, CUI Z, et al. Feature-attentioned object detection in remote sensing imagery[C]//Proceedings of the 26th IEEE International Conference on Image Processing, 2019: 3886-3890.
[12] CHEN L, LIU C, CHANG F, et al. Adaptive multi-level feature fusion and attention-based network for arbitrary-oriented object detection in remote sensing imagery[J]. Neurocomputing, 2021, 451(2): 67-80.
[13] LI Y Y, HUANG Q, PEI X, et al. RADet: refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images[J]. Remote Sensing, 2020, 12(3): 389-403.
[14] YANG F, LI W, HU H, et al. Multi-scale feature integrated attention-based rotation network for object detection in VHR aerial images[J]. Sensors Basel Switzerland, 2020, 20(6): 1686-1701.
[15] 赵琰, 赵凌君, 匡纲要. 基于注意力机制特征融合网络的SAR图像飞机目标快速检测[J]. 电子学报, 2021, 49(9): 1665-1674.
ZHAO Y, ZHAO L J, KUANG J Y. Attention feature fusion network for rapid aircraft detection in SAR images[J]. Acta Electronica Sinica, 2021, 49(9): 1665-1674.
[16] 李婕, 周顺, 朱鑫潮, 等. 结合多通道注意力的遥感图像飞机目标检测[J]. 计算机工程与应用, 2022, 58(1): 209-217.
LI J, ZHOU S, ZHU X C, et al. Remote sensing image aircraft target detection combined with multiple channel attention[J]. Computer Engineering and Applications, 2022, 58(1): 209-217.
[17] 李阳阳, 毛鹤亭, 张小龙, 等. 利用非局部上下文信息的遥感图像小目标检测[J]. 西安电子科技大学学报（自然科学版）, 2022, 49(5): 117-124.
LI Y Y, MAO H T, ZHANG X L, et al. Small object detection in remote sensing images using non-local context information[J]. Journal of Xidian University (Natural Science), 2022, 49(5): 117-124.
[18] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.
[19] LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021: 9992-10002.
[20] WANG X L, GIRSHICK R, GUPTA A. Non-local neural?networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7794-7803.
[21] TROCKMAN A, KOLTER Z, et al. Patches are all you need?[J]. arXiv:2201.09792, 2022.
[22] BA J L, KIROS J R, HINTON G E. Layer normalization[J]. arXiv:1607.06450, 2016.
[23] XIA G S, BAI X, DING J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3974-3983.
[24] LIU Z, WANG H, WENG L, et al. Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 13(8): 1074-1078.
[25] COATES A, NG Y A. Learning feature representations with K-means[M]//Neural networks: tricks of the trade. Berlin, Heidelberg: Springer, 2012: 561-580.
[26] GLENN J, ALEX S, JIRKA B, et al. ultralytics/yolov5: v5.0[EB/OL]. [2022-04-18]. https://github.com/ultralytics/yolov5.
[27] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[28] LIU S, DI H, WANG Y. Receptive field block net for accurate and fast object detection[C]//LNCS 11215: Proceedings of the 15th European Conference on Computer Vision, 2018: 404-419.
[29] 刘高天, 段锦, 范祺, 等. 基于改进RFBNet算法的遥感图像目标检测[J]. 吉林大学学报（理学版）, 2021, 59(5): 1188-1198.
LIU G T, DUAN J, FAN Q, et al. Target detection for remote sensing image based on improved RFBNet algorithm[J]. Journal of Jilin University (Science Edition), 2021, 59(5): 1188-1198.
[30] ZHANG G, LU S, ZHANG W, et al. CAD-Net: a context-aware detection network for objects in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sen-sing, 2019, 57(12): 10015-10024.
[31] YANG X, YANG J, YAN J, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 8231-8240.
[32] QIN R, LIU Q, GAO G, et al. MRDet: a multi-head network for accurate oriented object detection in aerial images[J]. arXiv:2012.13135, 2020.
[33] HAN J, DING J, XUE N, et al. ReDet: a rotation-equivariant detector for aerial object detection[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2785-2794.
[34] YANG X, LIU Q, YAN J, et al. R3Det: refined single-stage detector with feature refinement for rotating object[J]. arXiv:1908.05612, 2019.
[35] YANG X, YAN J, YANG X, et al. SCRDet++: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing[J]. arXiv: 2004. 13316, 2020.