Fast Remote Sensing Image Object Detection Algorithm Based on Attention Feature Fusion

doi:10.3778/j.issn.1002-8331.2303-0375

Abstract

Abstract: Aiming at the challenges of complex backgrounds, numerous small targets, and difficulty in feature extraction in remote sensing images, a fast remote sensing image object detection algorithm based on attention feature fusion—YOLO-Aff is proposed. This algorithm designs a backbone network module (ECALAN) with channel attention and a blur pool (BP) module to reduce the loss caused by downsampling. In addition, a feature pyramid network (SPD-FPN) with no stride convolution is used to combine the SimAM attention feature fusion module (CBSA) to enhance the cross-scale feature fusion performance of the features. Finally, Wise-IoU is used as the coordinate loss of the network to optimize the sample imbalance problem. The experimental results show that YOLO-Aff achieves an mAP value of 96% on the NWPU VHR-10 dataset, which is 2.9 percentage points higher than the original algorithm, and provides a new solution for fast and high-precision object detection of remote sensing images.

Key words: remote sensing image, object detection, YOLO, attention mechanism, feature pyramid

摘要： 针对遥感图像背景复杂、小目标多、特征提取难等问题，提出了一种注意力特征融合的快速遥感图像目标检测算法——YOLO-Aff。该算法设计了一种带通道注意力的主干网络模块（ECALAN）以及模糊池（BP）模块来减小下采样带来的损失。此外，采用了一种无跨步卷积的特征金字塔网络（SPD-FPN）结合SimAM注意力特征融合模块（CBSA）来增强特征的跨尺度融合能力。最后，通过使用Wise-IoU作为网络的坐标损失来优化样本不均衡问题。实验结果表明，改进的YOLO-Aff算法在NWPU VHR-10数据集上的mAP值达到96%，较原算法mAP提高了2.9个百分点，为遥感图像的快速、高精度目标检测提供了新的解决方案。

关键词: 遥感图像, 目标检测, YOLO, 注意力机制, 特征融合

WU Jiancheng, GUO Rongzuo, CHENG Jiawei, ZHANG Hao. Fast Remote Sensing Image Object Detection Algorithm Based on Attention Feature Fusion[J]. Computer Engineering and Applications, 2024, 60(1): 207-216.

吴建成, 郭荣佐, 成嘉伟, 张浩. 注意力特征融合的快速遥感图像目标检测算法[J]. 计算机工程与应用, 2024, 60(1): 207-216.

References

[1] CAO Y, WANG J, JIN Y, et al. Few-shot object detection via association and discrimination[C]//Advances in Neural Information Processing Systems, 2021: 16570-16581.
[2] TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9627-9636.
[3] ZHU X, SU W, LU L, et al. Deformable DETR: deformable transformers for end-to-end object detection[J]. arXiv:2010. 04159, 2020.
[4] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[5] JEUNE P L, MOKRAOUI A. A comparative attention framework for better few-shot object detection on aerial images[J]. arXiv:2210.13923, 2022.
[6] 李超, 王凯, 丁才昌, 等. 改进特征融合网络的遥感图像小目标检测[J]. 计算机工程与应用, 2023, 59(17): 232-241.
LI C, WANG K, DING C C, et al. Improved feature fusion network for small object detection in remote sensing images[J]. Computer Engineering and Applications, 2023, 59(17): 232-241.
[7] XU X, FENG Z, CAO C, et al. An improved swin transformer-based model for remote sensing object detection and instance segmentation[J]. Remote Sensing, 2021, 13(23): 4779.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, 2016: 779-788.
[9] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[10] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[11] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[12] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[13] GONG H, MU T, LI Q, et al. Swin-transformer-enabled YOLOv5 with attention mechanism for small object detection on satellite images[J]. Remote Sensing, 2022, 14(12): 2861.
[14] WU W, LIU H, LI L, et al. Application of local fully convolutional neural network combined with YOLO v5 algorithm in small target detection of remote sensing image[J]. PloS One, 2021, 16(10): e0259283.
[15] 李坤亚, 欧鸥, 刘广滨, 等. 改进YOLOv5的遥感图像目标检测算法[J]. 计算机工程与应用, 2023, 59(9): 207-214.
LI K Y, OU O, LIU G B, et al. Target detection algorithm of remote sensing image based on improved YOLOv5[J]. Computer Engineering and Applications, 2023, 59(9): 207-214.
[16] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv:2207.02696, 2022.
[17] SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects[J]. arXiv:2208.03641, 2022.
[18] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11534-11542.
[19] YANG L, ZHANG R Y, LI L, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning, 2021: 11863-11874.
[20] PAN X, GE C, LU R, et al. On the integration of self-attention and convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 815-825.
[21] MA N, ZHANG X, LIU M, et al. Activate or not: learning customized activation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 8032-8042.
[22] ZHANG R. Making convolutional networks shift-invariant again[C]//International Conference on Machine Learning, 2019: 7324-7334.
[23] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.
[24] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 12993-13000.
[25] TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[J]. arXiv:2301.10051, 2023.
[26] CHENG G, ZHOU P, HAN J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7405-7415.