Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (8): 227-238.DOI: 10.3778/j.issn.1002-8331.2210-0366

• Graphics and Image Processing • Previous Articles     Next Articles

Multi-Head Attention Detection of Small Targets in Remote Sensing at Multiple Scales

ZHANG Zhaoyang, ZHANG Shang, WANG Hengtao, RAN Xiukang   

  1. 1.College of Electrical Engineering & New Energy, China Three?Gorges University, Yichang, Hubei 443002, China
    2.Hubei Provincial Engineering Technology Research Center for Building Quality Inspection Equipment, China Three Gorges University, Yichang, Hubei 443002, China
    3.College of Computer and Information Technology, China Three Gorges University, Yichang, Hubei 443002, China
  • Online:2023-04-15 Published:2023-04-15



  1. 1.三峡大学 电气与新能源学院,湖北 宜昌 443002
    2.三峡大学 湖北省建筑质量检测装备工程技术研究中心,湖北 宜昌 443002
    3.三峡大学 计算机与信息学院,湖北 宜昌 443002

Abstract: For the targets to be detected in complex geospatial remote sensing images, there are problems such as multi-scale characteristics, morphological changes, and too few small target discriminative features, resulting in low detection and recognition accuracy. This paper proposes a multi-scale object detection algorithm for remote sensing small objects based on multi-head attention YOLO-StrVB. First, it reconstructs the network structure, builds a multi-scale network model, adds a target detection layer, and improves the detection ability of the remote sensing small target model under the feature extraction network at different scales. Then, a bidirectional feature pyramid network(Bi-FPN) is added for multi-scale feature fusion to improve bidirectional cross-scale connections and weighted feature fusion. Secondly, the multi-head attention mechanism block of Swin Transformer is integrated at the end of the YOLOv5 network to improve the multi-scale fusion relationship of the receptive field to adapt to the target recognition task, and optimize the backbone network. Finally, it uses Varifocal Loss to train the network to improve the confidence and positioning accuracy of remote sensing dense detection small targets, and selects CIOU as the loss function of the border frame regression to improve the backing accuracy of the frame of perception classification(IACS). Through experimental verification on the remote sensing target dataset NWPU VHR-10, the mAP compared with the original YOLOv5 model is increased by 3.05 percentage points, which can effectively improve the detection accuracy of small targets and achieve the robustness of small target detection in geospatial remote sensing images.

Key words: YOLOv5, remote sensing, small object detection, Swin Transformer, multi-scale feature fusion

摘要: 针对地理空间遥感图像中检测目标存在多尺度特性、形态多变以及小目标判别特征过少等造成检测识别精度不高的问题,提出了基于多尺度下遥感小目标多头注意力检测算法YOLO-StrVB。对网络结构进行重构,搭建多尺度网络模型,增加目标检测层,提高特征提取网络下遥感小目标模型不同尺度下的检测能力;加入双向特征金字塔网络(Bi-FPN)进行多尺度特征融合,提高双向跨尺度连接和加权特征融合;在YOLOv5网络末端融合Swin Transformer多头注意力机制块,提升感受野适应目标识别任务的多尺度融合关系,优化主干网络;使用Varifocal loss对网络进行训练,提升遥感密集检测小目标的存在置信度和定位精度,并选用CIoU作为边界框回归的损失函数,提高感知分类得分(IACS)的边框回归精度。通过在遥感目标数据集NWPU VHR-10上的实验验证,对比YOLOv5原模型的mAP提高了3.05个百分点,能有效提高小目标的检测精度,达到了对地理空间遥感图像中小目标检测的鲁棒性。

关键词: YOLOv5, 遥感, 小目标检测, Swin Transformer, 多尺度特征融合