计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (1): 109-120.DOI: 10.3778/j.issn.1002-8331.2405-0030

• YOLOv8 改进及应用专题 • 上一篇    下一篇

优化改进YOLOv8无人机视角下目标检测算法

孙佳宇,徐民俊,张俊鹏,炎梦雪,操文,侯阿临   

  1. 1.长春工业大学 计算机科学与工程学院,长春 130012
    2.吉林大学 符号计算与知识工程教育部重点实验室,长春 130012
    3.吉林省新一代人工智能智慧健康联合创新实验室,长春 130012
  • 出版日期:2025-01-01 发布日期:2024-12-31

Optimized and Improved YOLOv8 Target Detection Algorithm from UAV Perspective

SUN Jiayu, XU Minjun, ZHANG Junpeng, YAN Mengxue, CAO Wen, HOU Alin   

  1. 1.College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China   
    2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
    3.Jilin Province New Generation Artificial Intelligence Smart Health Joint Innovation Laboratory, Changchun 130012, China
  • Online:2025-01-01 Published:2024-12-31

摘要: 针对无人机视角下的目标存在多尺度、目标小、被遮挡与背景复杂等问题,提出了一种基于动态样本注意力尺度序列的YOLOv8改进算法BDAD-YOLO。通过引入BiFormer的思想来改造原模型骨干结构,提高模型对关键信息的关注度,更好地保留目标细粒度细节信息。由于目标存在大小、位置等多变性,传统卷积并不能很好地处理这一情况,因此基于DCN(deformable convolutional network)的思想,设计了一种可以增强对小目标特征提取的C2_DCf模块,从而进一步提高颈部网络中小目标层对特征信息的融合。提出一种基于动态样本的注意力尺度序列融合框架AFD(attention-scale sequence fusion framework based on dynamic samples),使用轻量化动态点采样并通过融合不同尺度的特征图来增强网络提取多尺度信息的能力。使用WIoU损失函数,改善小目标低质量数据对梯度的不利影响,以加快网络收敛速度。实验结果表明,在VisDrone数据集中的val集与test集上平均精度(mAP@0.5)分别提升了4.6个百分点、3.7个百分点,在DOTA数据集上平均精度(mAP@0.5)提升了2.4个百分点,证明了改进算法的有效性和普适性。

关键词: 目标检测, 无人机视角, YOLOv8, BiFormer, 特征融合, 损失函数

Abstract: Aiming at the problems of multi-scale, small target, complex background and target occlusion in unmanned aerial vehicle(UAV) view, an improved YOLOv8 algorithm BDAD-YOLO based on dynamic sample attention scale sequences is proposed. Firstly, by introducing the idea of BiFormer, the backbone structure of the original model is reformed to improve the model’s attention to key information and better retain the fine-grained details of the target. Because of the variability of the size and position of the target, the traditional convolution can’t handle this situation well. Therefore, based on the idea of deformable convolutional network (DCN), a C2_DCf module is designed, which can enhance the feature extraction of small targets, so as to further improve the fusion of feature information between small and medium-sized target layers in the neck network. Secondly, an attentional scale sequence fusion framework based on dynamic samples is proposed, which uses lightweight dynamic point sampling and fuses feature maps of different scales to both enhance the ability of the network and extract multi-scale information. Finally, WIoU loss function is used to improve the adverse effects of small target and low-quality data on the gradient, thereby accelerating the convergence speed of the network. The experimental results show that the average detection accuracy is increased by 4.6?percentage points and 3.7 percentage points on val set and test set in VisDrone data set respectively, and by 2.4?percentage points on DOTA data set, demonstrating the effectiveness and universality of the improved algorithm.

Key words: target detection, unmanned aerial vehicle perspective, YOLOv8, BiFormer, feature fusion, loss function