改进YOLOv5的无人机航拍图像目标检测算法

doi:10.3778/j.issn.1002-8331.2307-0171

摘要/Abstract

摘要： 针对无人机航拍图像目标检测中目标尺度多样、相似目标众多、目标聚集导致的目标漏检、误检问题，提出了改进YOLOv5的无人机航拍图像目标检测算法DA-YOLO。提出由特征图注意力生成器和动态权重学习模块组成的多尺度动态特征加权融合网络，特征图注意力生成器融合处理不同尺度目标更重要的特征，权重学习模块自适应地调节对不同尺度目标特征的学习，该网络可增强在目标尺度多样下的辨识度从而降低目标漏检。设计一种并行选择性注意力机制（PSAM）添加到特征提取网络中，该模块通过动态融合空间信息和通道信息，加强特征的表达获得更优质的特征图，提高网络对相似目标的区分能力以减少误检。使用Soft-NMS代替YOLOv5中采用的非极大值抑制（NMS）以改善目标聚集场景下的漏检、误检。实验结果表明，改进算法在VisDrone数据集上检测精度达到37.79%，相比于YOLOv5s算法精度提高了5.59个百分点，改进后的算法可以更好地应用于无人机航拍图像目标检测中。

关键词: 无人机航拍图像处理, 特征图注意力生成器, 动态特征加权融合, 注意力机制, 非极大值抑制

Abstract: Aiming at the problems of target missed detection and misdetection caused by diverse target scales, many similar targets and target aggregation in UAV aerial image target detection, DA-YOLO, an improved UAV aerial image target detection algorithm for YOLOv5, is proposed. Firstly, a multi-scale dynamic feature-weighted fusion network composed of feature map attention generator and dynamic weight learning module is proposed, the feature map attention generator is integrated to process the more important features of different scale targets, and the weight learning module adaptively adjusts the learning of target features of different scales, which can enhance the recognition of the network under the diversity of target scales and reduce target missed detection. Secondly, a parallel selective attention mechanism (PSAM) is designed to be added to the feature extraction network, which strengthens the expression of features to obtain a better feature map by dynamically fusing spatial information and channel information, and improves the network’s ability to distinguish similar targets to reduce false detection. Finally, Soft-NMS is used instead of the non-maximum suppression (NMS) adopted in YOLOv5 to improve the missed detection and false detection in the target aggregation scenario. Experimental results show that the detection accuracy of the improved algorithm on the VisDrone dataset reaches 37.79%, which is 5.59 percentage points higher than that of the YOLOv5s algorithm, and the improved algorithm can be better applied to the target detection of UAV aerial images.

Key words: UAV aerial image processing, feature map attention generator, dynamic feature-weighted fusion, attention mechanisms, non-maximum suppression

李校林, 刘大东, 刘鑫满, 陈泽. 改进YOLOv5的无人机航拍图像目标检测算法[J]. 计算机工程与应用, 2024, 60(11): 204-214.

LI Xiaolin, LIU Dadong, LIU Xinman, CHEN Ze. Target Detection Algorithm of UAV Aerial Image Based on Improved YOLOv5[J]. Computer Engineering and Applications, 2024, 60(11): 204-214.

参考文献

[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[2] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// Advances in Neural Information Processing Systems, 2015, 28: 91-99.
[4] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[5] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517-6525.
[6] YU Q, WANG K, WANG H. A multiscale YOLOv3 object detection algorithm[J]. Journal of Jiangsu University (Natural Science Edition), 2021, 42(6): 628-633.
[7] LIU W, ANGUELOV D, ERHAND, et al. SSD: single shot multi box detector[C]//Proceedings of the 14th European Conference on Computer Vision, 2016: 21-37.
[8] LIU S, HUANG D, WANG Y, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the 2018 European Conference on Computer Vision, 2018: 404-419.
[9] 冷佳旭, 莫梦竟成, 周应华, 等. 无人机视角下的目标检测研究进展[J]. 中国图象图形学报, 2023, 28(9): 2-13.
LENG J X, MO M J C, ZHOU Y H, et al. Research progress of target detection from the perspective of UAV[J]. Journal of Image and Graphics, 2023, 28(9): 2-13.
[10] 范江霞, 张文豪, 张丽丽, 等. 改进YOLOv5的无人机影像车辆检测方法[J]. 遥感信息, 2023, 38(3): 114-121.
FAN J X, ZHANG W H, ZHANG L L, et al. Improved detection method of UAV imaging vehicle of YOLOv5[J]. Remote Sensing Information, 2023, 38(3): 114-121.
[11] 徐光达, 毛国君. 多层级特征融合的无人机航拍图像目标检测[J]. 计算机科学与探索, 2023, 17(3): 635-643.
XU G D, MAO G J. Aerial image object detection of UAV based on multi-level feature fusion[J].Journal of Frontiers of Computer Science & Technology, 2023, 17(3): 635-643.
[12] CHENG G, SI Y, HONG H, et al. Cross-scale feature fusion for object detection in optical remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(3): 1-5.
[13] KISANTAL M, WOJNA Z, MURAWSKI J, et al. Augmentation for small object detection[J]. arXiv:1902.07296, 2019.
[14] 张翼, 马荣贵, 梁辰. 改进YOLOv5的无人机影像道路目标检测算法[J]. 测试科学与仪器, 2024, 15(1): 128-139.
ZHANG Y, MA R G, LIANG C. Improved road target detection algorithm for UAV image of YOLOv5[J]. Journal of Measurement Science and Instrumentation, 2024, 15(1): 128-139.
[15] 谢椿辉, 吴金明, 徐怀宇. 改进YOLOv5的无人机影像小目标检测算法[J]. 计算机工程与应用, 2023, 59(9): 198-206.
XIE C H, WU J M, XU H Y. Small object detection algorithm based on improved YOLOv5 in UAV image[J]. Computer Engineering and Applications, 2023, 59(9): 198-206.
[16] NEUBECK A, GOOL L J V. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition, 2006: 20-24.
[17] BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS--improving object detection with one line of code[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 5561-5569.
[18] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[19] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[20] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[21] YANG J, FU X, HU Y, et al. PanNet: a deep network architecture for pan-sharpening[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 5449-5457.
[22] WANG X, ZHANG S, YU Z, et al. Scale equalizing pyramid convolution for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 13359-13368.
[23] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[24] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[J]. arXiv:1910.03151, 2019.
[25] HOU X, LIU C, WAN F, et al. DANet: divergent activation for weakly supervised object localization[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019: 6589-6598.
[26] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[J]. arXiv:1506.02025, 2015.
[27] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 764-773.
[28] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision, 2018: 3-19.
[29] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3146-3154.
[30] LIU S, WANG Y, HUANG D. Adaptive NMS: refining pedestrian detection in a crowd[C]//Proceedings of the CVF Conference on Computer Vision and Pattern Recognition, 2019: 1-9.
[31] ZHENG Z H, PING W, WEI L, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 2-8.