Improved Complex Road Scene Object Detection Algorithm of YOLOv7

doi:10.3778/j.issn.1002-8331.2306-0021

Abstract

Abstract: Although the target detection algorithm based on deep learning has achieved good results in the target detection in the road scene, for the dense targets in the complex road scene, the detection accuracy of distant small-scale targets is low, and the problem of missing detection and false detection is easy to occur. An improved YOLOv7 target detection algorithm in the complex road scene is proposed. It adds small target detection layer, increases the feature learning ability of small target; K-means++ is used to reunite the prior frame, which makes the prior frame fit the target better and increases the positioning accuracy of the target. WIoU (Wise-IoU) loss function is used to increase the attention of the network to the common mass anchor frame and improve the ability of the network to locate the target. CoordConv is introduced into the neck and detection head, so that the network can better sense the position information in the feature map. P-ELAN structure is proposed to reduce the number of algorithm parameters and the amount of computation. The experimental results show that the mAP of the improved algorithm under Huawei SODA10M dataset reaches 64.8%, which is 2.6 percentage points higher than the original algorithm. The number of model parameters and the amount of computation are reduced by 12% and 7% respectively, to achieve the balance of detection accuracy and detection speed.

Key words: YOLOv7, road target detection, CoordConv, K-means++, lightweight

摘要： 虽然基于深度学习的目标检测算法在道路场景中的目标检测方面已经取得了很好的效果，但是对于复杂道路场景中的密集目标，远处的小尺度目标检测精度低，容易出现漏检误检的问题，提出一种改进YOLOv7的复杂道路场景目标检测算法。增加小目标检测层，增加对小目标的特征学习能力；采用K-means++重聚类先验框，使得先验框更贴合目标，增加网络对目标的定位精度；采用WIoU（Wise-IoU）损失函数，增加网络对普通质量锚框的关注度，提高网络对目标的定位能力；在颈部和检测头引入协调坐标卷积（CoordConv），使网络能够更好地感受特征图中的位置信息；提出P-ELAN结构对骨干网络进行轻量化处理，降低算法参数量和运算量。实验结果表明，该改进算法在华为SODA10M数据集下的mAP达到64.8%，比原算法提高2.6个百分点，模型参数量和运算量分别降低12%和7%，达到检测精度和检测速度的平衡。

关键词: YOLOv7, 道路目标检测, CoordConv, K-means++, 轻量化

DU Juan, CUI Shaohua, JIN Meijuan, RU Chen. Improved Complex Road Scene Object Detection Algorithm of YOLOv7[J]. Computer Engineering and Applications, 2024, 60(1): 96-103.

杜娟, 崔少华, 晋美娟, 茹琛. 改进YOLOv7的复杂道路场景目标检测算法[J]. 计算机工程与应用, 2024, 60(1): 96-103.

References

[1] VIOLA P A, JONES M J. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.
[2] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//IEEE Computer Society Conference on Computer Vision & Pattern Recognition, 2005.
[3] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[5] GIRSHICK R. Fast Region-based convolutional network method[C]//Proceedings of the lEEE International Conference on Computer Vision, 2015: 1440-1448.
[6] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans PatternAnal Mach Intell, 2015, 39(6): 1137-1149.
[7] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 2016: 21-37.
[8] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2999-3007.
[9] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[10] REDMON J, FARHAD1A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
[11] REDMON J, FARHADI A. Yolov3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[12] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[13] 樊嵘, 马小陆. 面向拥挤行人检测的改进DETR算法[J]. 计算机工程与应用, 2023, 59(19): 159-165.
FAN R, MA X L. Improved DETR for crowded pedestrian detection[J]. Computer Engineering and Applications, 2023, 59(19): 159-165.
[14] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[15] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.
[16] 韦强, 胡晓阳, 赵虹鑫. 改进YOLOv5的交通标志检测方法[J]. 计算机工程与应用, 2023, 59(13): 229-237.
WEI Q, HU X Y, ZHAO H X. Improved traffic sign detection method for YOLOv5[J]. Computer Engineering and Applications, 2023, 59(13): 229-237.
[17] RAO Y, ZHAO W, Tang Y, et al. Hornet: efficient high-order spatial interactions with recursive gated convolutions[J]. arXiv:2207.14284, 2022.
[18] ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[19] FRANK E, HALL M. A simple approach to ordinal classification[C]//European Conference on Machine Learning. Berlin, Heidelberg: Springer, 2001: 145-156.
[20] 盛博莹, 侯进, 李嘉新, 等. 面向复杂交通场景的道路目标检测方法[J]. 计算机工程与应用, 2023, 59(15): 87-96.
SHENG F Y, HOU J, LI J X, et al. Road object detection method for complex road scenes[J]. Computer Engineering and Applications, 2023, 2023, 59(15): 87-96.
[21] 冉险生, 苏山杰, 陈俊豪, 等. 自适应特征融合的复杂道路场景目标检测算法[J]. 计算机工程与应用, 2023, 59(24): 216-226.
RAN X S, SU S J, CHEN J H, et al. Object detection algorithm for complex road scenes based on adaptive feature fusion[J]. Computer Engineering and Applications, 2023, 59(24): 216-226.
[22] LI X, WANG W, WU L, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]//Advances in Neural Information Processing Systems, 2020: 21002-21012.
[23] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv:2207.02696, 2022.
[24] TONG Z, CHENY, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[J]. arXiv:2301.10051, 2023.
[25] ZHANG X, ZENG H, GUO S, et al. Efficient long- range attention network for images uper-resolution[J]. arXiv:2203.06697, 2022.
[26] ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. arXiv:1702.03118, 2017.
[27] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. arXiv:1911.08287, 2019.
[28] LIU R, LEHMAN J, MOLINO P, et al. An intriguing failing of convolutional neural networks and the CoordConv solution[J]. arXiv:1807.03247, 2018.
[29] CHEN J, KAOS H, HE H, et al. Run, don’t walk: chasing higher flops for faster neural networks[J]. arXiv:2303. 03667, 2023.
[30] HAN J, LIANG X, XU H, et al. SODA10M: a large-scale2D self/semi-supervised object detection dataset for autonomous driving[J]. arXiv:2106.11118, 2021.
[31] HAN K, WANG Y, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589.
[32] GENNARI M, FAWCETT R, PRISACARIU V A. DSConv: efficient convolution operator[J]. arXiv:1901.01928, 2019.
[33] ZHU X, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.