计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (11): 204-214.DOI: 10.3778/j.issn.1002-8331.2307-0171

• 图形图像处理 • 上一篇    下一篇

改进YOLOv5的无人机航拍图像目标检测算法

李校林,刘大东,刘鑫满,陈泽   

  1. 1.重庆邮电大学 通信与信息工程学院,重庆 400065
    2.重庆邮电大学 数智技术应用研究中心,重庆 400065
  • 出版日期:2024-06-01 发布日期:2024-05-31

Target Detection Algorithm of UAV Aerial Image Based on Improved YOLOv5

LI Xiaolin, LIU Dadong, LIU Xinman, CHEN Ze   

  1. 1.School of Communication and Information, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
    2.Research Center of Digital Intelligence Technology Application, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Online:2024-06-01 Published:2024-05-31

摘要: 针对无人机航拍图像目标检测中目标尺度多样、相似目标众多、目标聚集导致的目标漏检、误检问题,提出了改进YOLOv5的无人机航拍图像目标检测算法DA-YOLO。提出由特征图注意力生成器和动态权重学习模块组成的多尺度动态特征加权融合网络,特征图注意力生成器融合处理不同尺度目标更重要的特征,权重学习模块自适应地调节对不同尺度目标特征的学习,该网络可增强在目标尺度多样下的辨识度从而降低目标漏检。设计一种并行选择性注意力机制(PSAM)添加到特征提取网络中,该模块通过动态融合空间信息和通道信息,加强特征的表达获得更优质的特征图,提高网络对相似目标的区分能力以减少误检。使用Soft-NMS代替YOLOv5中采用的非极大值抑制(NMS)以改善目标聚集场景下的漏检、误检。实验结果表明,改进算法在VisDrone数据集上检测精度达到37.79%,相比于YOLOv5s算法精度提高了5.59个百分点,改进后的算法可以更好地应用于无人机航拍图像目标检测中。

关键词: 无人机航拍图像处理, 特征图注意力生成器, 动态特征加权融合, 注意力机制, 非极大值抑制

Abstract: Aiming at the problems of target missed detection and misdetection caused by diverse target scales, many similar targets and target aggregation in UAV aerial image target detection, DA-YOLO, an improved UAV aerial image target detection algorithm for YOLOv5, is proposed. Firstly, a multi-scale dynamic feature-weighted fusion network composed of feature map attention generator and dynamic weight learning module is proposed, the feature map attention generator is integrated to process the more important features of different scale targets, and the weight learning module adaptively adjusts the learning of target features of different scales, which can enhance the recognition of the network under the diversity of target scales and reduce target missed detection. Secondly, a parallel selective attention mechanism (PSAM) is designed to be added to the feature extraction network, which strengthens the expression of features to obtain a better feature map by dynamically fusing spatial information and channel information, and improves the network’s ability to distinguish similar targets to reduce false detection. Finally, Soft-NMS is used instead of the non-maximum suppression (NMS) adopted in YOLOv5 to improve the missed detection and false detection in the target aggregation scenario. Experimental results show that the detection accuracy of the improved algorithm on the VisDrone dataset reaches 37.79%, which is 5.59 percentage points higher than that of the YOLOv5s algorithm, and the improved algorithm can be better applied to the target detection of UAV aerial images.

Key words: UAV aerial image processing, feature map attention generator, dynamic feature-weighted fusion, attention mechanisms, non-maximum suppression