计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (14): 250-256.DOI: 10.3778/j.issn.1002-8331.2305-0442

• 图形图像处理 • 上一篇    下一篇

适用于鱼眼图像的改进YOLOv7目标检测算法

吴兆东,徐成,刘宏哲,付莹,蹇木伟   

  1. 1.北京联合大学 北京市信息服务工程重点实验室,北京 100101
    2.北京理工大学 计算机学院,北京 100081
    3.山东财经大学 计算机科学与技术学院,济南 250014
  • 出版日期:2024-07-15 发布日期:2024-07-15

Improved YOLOv7 Object Detection Algorithm for Fisheye Images

WU Zhaodong, XU Cheng, LIU Hongzhe, FU Ying, JIAN Muwei   

  1. 1.Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
    2.School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    3.School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
  • Online:2024-07-15 Published:2024-07-15

摘要: 鱼眼相机捕获的图像具有宽视场、几何失真和尺度差异大等特点,这给基于标准卷积网络的目标检测器带来了巨大的挑战。现有的目标检测算法可以在网络结构设计、特征学习等方面进一步改进以适用于鱼眼图像上的失真目标检测任务。为减轻鱼眼图像上径向畸变的影响,研究在YOLOv7主干引入多分支堆叠结构的多头注意力模块以捕获全局上下文信息,提高检测准确性。同时,在YOLOv7的Neck侧,使用简单高效的融合可变形卷积的层聚合结构以实现有效的多尺度特征融合,提高模型对失真目标的特征提取能力。提出的检测模型直接在鱼眼图像上执行,无须指定先验信息和校准。在公开的综合鱼眼图像数据集VOC_360上进行实验,结果表明,改进后的YOLOv7鱼眼图像目标检测器有效地提高了检测精度,mAP50、mAP50:95分别达到84.3%、70.4%,相比基准模型YOLOv7分别提升3.1个百分点、6.4个百分点。

关键词: 目标检测, 鱼眼图像, 多头注意力, 可变形卷积, YOLO算法

Abstract: Images taken by fisheye cameras are characterized by wide field of view, geometric distortion and large scale variance, which bring great challenges to object detectors based on general convolutional networks. Existing object detection algorithms can be further improved with respect to network structure design, feature learning to be applicable to the distorted object detection task on fisheye images. To mitigate the effect of radial distortion on fisheye images, a multi-head attention module with multi-branch stacking structure is used in the YOLOv7 backbone to capture global contextual information. Meanwhile, a simple and efficient layer aggregation structure combining deformable convolutions is used on the Neck side of YOLOv7 to achieve effective multi-scale feature fusion. Experiments are conducted on the public comprehensive fisheye image dataset VOC_360, and the results show that the improved YOLOv7 fisheye image object detector effectively achieves detection accuracy of 84.3%?and 70.4% for mAP50 and mAP50:95, respectively, which is 3.1 percentage points and 6.4 percentage points higher than the baseline model YOLOv7, respectively.

Key words: object detection, fisheye image, multi-head attention, deformable convolution, YOLO algorithm