计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (9): 142-150.DOI: 10.3778/j.issn.1002-8331.2312-0043

• YOLOv8 改进及应用专题 • 上一篇    下一篇

面向无人机视角下小目标检测的YOLOv8s改进模型

潘玮,韦超,钱春雨,杨哲   

  1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006
  • 出版日期:2024-05-01 发布日期:2024-04-29

Improved YOLOv8s Model for Small Object Detection from Perspective of Drones

PAN Wei, WEI Chao, QIAN Chunyu, YANG Zhe   

  1. School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
  • Online:2024-05-01 Published:2024-04-29

摘要: 从无人机视角进行目标检测,面临图像目标小、分布密集、类别不均衡等难点,且由于无人机的硬件条件限制了模型的规模,导致模型的准确率偏低。提出一种融合多种注意力机制的YOLOv8s改进模型,在骨干网络中引入感受野注意力卷积和CBAM(concentration-based attention module)注意力机制改进卷积模块,解决注意力权重参数在感受野特征中共享问题的同时,在通道和空间维度加上注意力权重,增强特征提取能力;通过引入大型可分离卷积注意力思想,改造空间金字塔池化层,增加不同层级特征间的信息交融;优化颈部结构,增加具有丰富小目标语义信息的特征层;使用inner-IoU损失函数的思想改进MPDIoU(minimum point distance based IoU)函数,以inner-MPDIoU代替原损失函数,提升对困难样本的学习能力。实验结果表明,改进后的YOLOv8s模型在VisDrone数据集上mAP、P、R分别提升了16.1%、9.3%、14.9%,性能超过YOLOv8m,可以有效应用于无人机平台上的目标检测任务。

关键词: 无人机, 小目标检测, YOLOv8s, 感受野注意力, 大型可分离卷积

Abstract: Facing with the problems of small and densely distributed image targets, uneven class distribution, and model size limitation of hardware conditions, object detection from the perspective of drones has less precise results. A new improved model based on YOLOv8s with multiple attention mechanisms is proposed. To solve the problem of shared attention weight parameters in receptive field features and enhance feature extraction ability, receptive field attention convolution and CBAM (concentration based attention module) attention mechanism are introduced into the backbone, adding attention weight in channel and spatial dimensions. By introducing large separable kernel attention into feature pyramid pooling layers, information fusion between different levels of features is increased. The feature layers with rich semantic information of small targets are added to improve the neck structure. The inner-IoU loss function is used to improve the MPDIoU (minimum point distance based IoU) function and the inner-MPDIoU instead of the original loss function is used to enhance the learning ability for difficult samples. The experimental results show that the improved YOLOv8s model has improved mAP, P, and R by 16.1%, 9.3%, and 14.9% respectively on the VisDrone dataset, surpassing YOLOv8m in performance and can be effectively applied to unmanned aerial vehicle visual detection tasks.

Key words: unmanned aerial vehicle (UAV), small object detection, YOLOv8s, receptive field attention, large separable kernel attention