计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 249-257.DOI: 10.3778/j.issn.1002-8331.2209-0206

• 图形图像处理 • 上一篇    下一篇

混合注意力特征增强的航空图像目标检测

管文青,周世斌,张国鹏   

  1. 中国矿业大学  计算机科学技术学院,江苏  徐州  221116
  • 出版日期:2024-02-15 发布日期:2024-02-15

Aerial Image Object Detection with Feature Enhancement Using Hybrid Attention

GUAN Wenqing, ZHOU Shibin, ZHANG Guopeng   

  1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
  • Online:2024-02-15 Published:2024-02-15

摘要: 针对航空图像背景复杂、目标分布密集、尺度差异大等特点,提出一种新的航空图像检测网络,称为混合注意力网络(hybrid attention network, HA-Net)。在主干网络中设计同时兼顾局部注意力和全局注意力的Transformer结构,利用注意力消除背景噪音,使密集目标边界更加清晰,提升密集目标特征提取能力;在特征融合前,提出使用连续平均池化和最大池化的空间金字塔池化模块来丰富图像特征信息,增强不同尺度目标的表示能力;在特征融合时设计特征重构模块重新调整特征金字塔的特征信息,此模块混合了跨尺度空间注意力和非局部通道注意力,可以减少不必要信息的干扰,提升多尺度目标的检出率。在DOTA航空数据集上对HA-Net进行评估,在单尺度和多尺度测试上评估指标 mAP分别达到77.04%和78.28%,较基准网络,mAP分别提升了2.38个百分点和3.62个百分点。在HRSC2016数据集上mAP达到89.95%。实验结果的提升证明了HA-Net在航空图像目标检测中的有效性。

关键词: 航空图像, 旋转目标检测, Transformer, 注意力机制

Abstract: Aiming at the characteristics of complex background, dense distribution and large scale variation in aerial images, this paper proposes a novel object detection framework named as hybrid attention network (HA-Net). Firstly, Transformer structure both with local and global attention in the backbone network is designed to enhance dense targets feature extraction ability. The Transformer structure uses attention to suppress background noises and make dense target boundaries clearer. Then, a spatial pyramid pooling block using continuous AvgPooling and MaxPooling is adopted to enrich feature information and enhance the multi-scale target representation. Moreover, a feature reconstruction module mixing cross-scale spatial attention and non-local channel attention is designed to reconstruct the feature pyramid network, so as to reduce unnecessary information interference and facilitate multi-scale target detection. The network is evaluated on a large remote sensing dataset DOTA, and the evaluation mAP reaches 76.81% and 78.28% on single-scale test and multi-scale test respectively, which surpasses the baseline model by a large margin of 2.38 percentage points and 3.62 percentage points. The evaluation mAP reaches 89.95% on HRSC2016. The improvement of detection results proves the effectiveness of HA-Net in aerial image object detection.

Key words: aerial images, rotation object detection, Transformer, attention mechanism