适用于鱼眼图像的改进YOLOv7目标检测算法

doi:10.3778/j.issn.1002-8331.2305-0442

摘要/Abstract

摘要： 鱼眼相机捕获的图像具有宽视场、几何失真和尺度差异大等特点，这给基于标准卷积网络的目标检测器带来了巨大的挑战。现有的目标检测算法可以在网络结构设计、特征学习等方面进一步改进以适用于鱼眼图像上的失真目标检测任务。为减轻鱼眼图像上径向畸变的影响，研究在YOLOv7主干引入多分支堆叠结构的多头注意力模块以捕获全局上下文信息，提高检测准确性。同时，在YOLOv7的Neck侧，使用简单高效的融合可变形卷积的层聚合结构以实现有效的多尺度特征融合，提高模型对失真目标的特征提取能力。提出的检测模型直接在鱼眼图像上执行，无须指定先验信息和校准。在公开的综合鱼眼图像数据集VOC_360上进行实验，结果表明，改进后的YOLOv7鱼眼图像目标检测器有效地提高了检测精度，mAP50、mAP50：95分别达到84.3%、70.4%，相比基准模型YOLOv7分别提升3.1个百分点、6.4个百分点。

关键词: 目标检测, 鱼眼图像, 多头注意力, 可变形卷积, YOLO算法

Abstract: Images taken by fisheye cameras are characterized by wide field of view, geometric distortion and large scale variance, which bring great challenges to object detectors based on general convolutional networks. Existing object detection algorithms can be further improved with respect to network structure design, feature learning to be applicable to the distorted object detection task on fisheye images. To mitigate the effect of radial distortion on fisheye images, a multi-head attention module with multi-branch stacking structure is used in the YOLOv7 backbone to capture global contextual information. Meanwhile, a simple and efficient layer aggregation structure combining deformable convolutions is used on the Neck side of YOLOv7 to achieve effective multi-scale feature fusion. Experiments are conducted on the public comprehensive fisheye image dataset VOC_360, and the results show that the improved YOLOv7 fisheye image object detector effectively achieves detection accuracy of 84.3%?and 70.4% for mAP50 and mAP50:95, respectively, which is 3.1 percentage points and 6.4 percentage points higher than the baseline model YOLOv7, respectively.

Key words: object detection, fisheye image, multi-head attention, deformable convolution, YOLO algorithm

吴兆东, 徐成, 刘宏哲, 付莹, 蹇木伟. 适用于鱼眼图像的改进YOLOv7目标检测算法[J]. 计算机工程与应用, 2024, 60(14): 250-256.

WU Zhaodong, XU Cheng, LIU Hongzhe, FU Ying, JIAN Muwei. Improved YOLOv7 Object Detection Algorithm for Fisheye Images[J]. Computer Engineering and Applications, 2024, 60(14): 250-256.

参考文献

[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[2] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[3] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[4] 聂光涛, 黄华. 光学遥感图像目标检测算法综述[J]. 自动化学报, 2021, 47(8): 1749-1768.
NIE G T, HUANG H. A survey of object detection in optical remote sensing images[J]. Acta Automatica Sinica, 2021, 47(8): 1749-1768.
[5] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[6] RAMACHANDRAN S, SISTU G, KUMAR V R, et al. Woodscape fisheye object detection for autonomous driving—CVPR 2022 OmniCV workshop challenge[J]. arXiv:2206. 12912, 2022.
[7] 王燕, 吕艳萍. 混合深度CNN联合注意力的高光谱图像分类[J]. 计算机科学与探索, 2023, 17(2): 385-395.
WANG Y, LYU Y P. Hybrid deep CNN-attention for hyperspectral image classification[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 385-395.
[8] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 764-773.
[9] ZHU X, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.
[10] BARMAN A, WU W, LOCE R P, et al. Person re-identification using overhead view fisheye lens cameras[C]//Proceedings of the 2018 IEEE International Symposium on Technologies for Homeland Security, 2018: 1-7.
[11]JO Y G, HONG S H, HWANG S S, et al. Fisheye lens camera based autonomous valet parking system[J]. arXiv:2104. 13119, 2021.
[12] BERTOZZI M, CASTANGIA L, CATTANI S, et al. 360 detection and tracking algorithm of both pedestrian and vehicle using fisheye images[C]//Proceedings of the 2015 IEEE Intelligent Vehicles Symposium, 2015: 132-137.
[13] YANG C Y, CHEN H H. Efficient face detection in the fisheye image domain[J]. IEEE Transactions on Image Processing, 2021, 30: 5641-5651.
[14] LO Y C, HUANG C C, TSAI Y F, et al. Face recognition for fisheye images[C]//Proceedings of the 2022 IEEE International Conference on Image Processing, 2022: 146-150.
[15] WEI X, WEI Y, LU X. RMDC: rotation-mask deformable convolution for object detection in top-view fisheye cameras[J]. Neurocomputing, 2022, 504: 99-108.
[16] WANG C Y, LIAO H Y M, YEH I H. Designing network design strategies through gradient path analysis[J]. arXiv: 2211. 04800, 2022.
[17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[18] 贾天豪, 彭力, 戴菲菲. 引入残差学习与多尺度特征增强的目标检测器[J]. 计算机科学与探索, 2023, 17(5): 1102-1111.
JIA T H, PENG L, DAI F F. Object detector with residual learning and multi-scale feature enhancement[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1102-1111.
[19] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[20] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[21] 王鑫鹏, 王晓强, 林浩, 等. 深度学习典型目标检测算法的改进综述[J]. 计算机工程与应用, 2022, 58(6): 42-57.
WANG X P, WANG X Q, LIN H, et al. Review on improvement of typical object detection algorithms in deep learning[J]. Computer Engineering and Applications, 2022, 58(6): 42-57.
[22] FU J, BAJI? I V, VAUGHAN R G. Datasets for face and object detection in fisheye images[J]. Data in Brief, 2019, 27: 104752.
[23] AGRAWAL N, PRABHAKARAN V, WOBBER T, et al. Design tradeoffs for {SSD} performance[C]//Proceedings of the USENIX 2008 Annual Technical Conference, 2008: 57-70.
[24] CONTRIBUTORS M. YOLOv8 by MMYOLO[CP/OL].（2023-05-13）[2023-05-18].https://github.com/open?mmlab/mmyolo/tree/main/configs/yolov8, 2023.
[25] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[26] CAI Z, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.
[27] 包俊, 刘宏哲. 融合可变形卷积网络的鱼眼图像目标检测[J]. 计算机工程, 2021, 47(4): 248-255.
BAO J, LIU H Z. Object detection in fisheye images combining deformable convolutional networks[J]. Computer Engineering, 2021, 47(4): 248-255.
[28] LI T, TONG G, TANG H, et al. Fisheyedet: a self-study and contour-based object detector in fisheye images[J]. IEEE Access, 2020, 8: 71739-71751.
[29] RASHED H, MOHAMED E, SISTU G, et al. FisheyeYOLO: object detection on fisheye cameras for autonomous driving[C]//Proceedings of the Machine Learning for Autonomous Driving NeurIPS 2020 Virtual Workshop, 2020.