Traffic Sign Recognition Based on Omni-Dimensional Dynamic Convolution

doi:10.3778/j.issn.1002-8331.2306-0223

Abstract

Abstract: Aiming at the problem that the existing traffic sign recognition algorithms have low recognition accuracy and slow recognition speed for small and occluded targets, a traffic sign recognition algorithm based on omni-dimensional dynamic convolution (ODConv) is designed by improving YOLOv5 network. Firstly, the partial convolution in the backbone network is replaced by ODConv, so as to obtain more information in the process of feature extraction and improve the sensitivity of the network to small targets. Then, in order to reduce the loss of information in the upsampling process, the sub-pixel convolution module is used to replace the original nearest neighbor interpolation upsampling module in the feature fusion network, and the efficient layer aggregation module is used to replace the original CSPNet module to improve the feature fusion efficiency, extend the gradient shortest path, and improve the small target detection effect. Finally, the SIoU function is used to calculate the regression loss, solve the problem of direction mismatch between the real frame and the prediction frame, and further improve the detection accuracy of road traffic signs. Testing the model on TT100K dataset, and the average accuracy (mAP@0.5) reaches 93.85%, and the recall rate reaches 90.73%. Compared with the benchmark network YOLOv5n, it has increased by 3.90% and 5.69% respectively, and the frame processing speed reaches 89.29.

Key words: traffic sign recognition, YOLOv5, omni-dimensional dynamic convolution, sub-pixel convolution module, efficient layer aggregation module

摘要： 针对现有交通标志识别算法对于小目标和遮挡目标的识别精度不高，且识别速度较慢的问题，通过改进YOLOv5网络，设计一种基于全维动态卷积（ODConv）的交通标志识别算法。将主干网络中的部分卷积替换为全维动态卷积，以便在特征提取过程中获取更丰富的信息，提高网络对小目标的敏感度；为了减少上采样过程中信息的丢失，在特征融合网络中使用亚像素卷积模块替换原有的最近邻插值上采样模块，并使用高效层聚合模块替换原有的跨阶段层次模块，提高特征融合效率，延长梯度最短路径，改善小目标检测效果；使用SIoU函数计算回归损失，解决真实框与预测框之间方向不匹配的问题，进一步提高对道路交通标志的检测精度。在TT100K数据集上测试本模型，平均精度（mAP@0.5）达到了93.85%，召回率（Recall）达到了90.73%，与基准网络YOLOv5n相比分别提高了3.90%和5.69%，帧处理速度达到89.29。

关键词: 交通标志识别, YOLOv5, 全维动态卷积, 亚像素卷积模块, 高效层聚合模块

LI Wenju, YU Jie, SHA Liye, CUI Liu, YANG Hongzhe. Traffic Sign Recognition Based on Omni-Dimensional Dynamic Convolution[J]. Computer Engineering and Applications, 2024, 60(18): 316-323.

李文举, 于杰, 沙利业, 崔柳, 杨红喆. 基于全维动态卷积的交通标志识别[J]. 计算机工程与应用, 2024, 60(18): 316-323.

References

[1] 乔欢欢, 权恒友, 邱文利, 等. 改进YOLOv5s的交通标志识别算法[J]. 计算机系统应用, 2022, 31(12): 273-279.
QIAO H H, QUAN H Y, QIU W L, et al. Improved YOLOv5 algorithm for traffic sign recognition[J]. Computer Systems & Applications, 2022, 31(12): 273-279.
[2] 朱双东, 刘兰兰, 陆晓峰. 一种用于道路交通标志识别的颜色—几何模型[J]. 仪器仪表学报, 2007, 28(5): 956-960.
ZHU S D, LIU L L, LU X F. Color-geometric model for traffic sign recognition[J]. Chinese Journal of Scientific Instrument, 2007, 28(5): 956-960.
[3] 马永杰, 程时升, 马芸婷, 等. 多尺度特征融合与极限学习机结合的交通标志识别[J]. 液晶与显示, 2020, 35(6): 572-582.
MA Y J, CHENG S S, SHI Y T, et al. Traffic sign recognition based on multi-scale feature fusion and extreme learning machine[J]. Chinese Journal of Liquid Crystals and Displays, 2020, 35 (6): 572-582.
[4] 王海, 王宽, 蔡英凤, 等. 基于改进级联卷积神经网络的交通标志识别[J]. 汽车工程, 2020, 42(9): 1256-1262.
WANG H, WANG K, CAI Y F, et al. Traffic sign recognition based on improved cascade convolution neural network[J]. Automotive Engineering, 2020, 42 (9): 1256-1262.
[5] 冷坤, 秦伦明, 王悉. 基于CA-ASFF-YOLOv4的交通标志识别研究[J]. 计算机工程与应用, 2023, 59(17): 169-177.
LENG K, QIN L M, WANG X. Research on traffic sign recognition based on CA-ASFF-YOLOv4[J]. Computer Engineering and Applications, 2023, 59(17): 169-177.
[6] 李娇, 葛艳, 刘玉鹏. 基于改进YOLOv5的昏暗小目标交通标志识别[J]. 计算机系统应用, 2023, 32(5): 172-179.
LI J, GE Y, LIU Y P. Traffic sign recognition for dim small targets based on improved YOLOv5[J]. Computer Systems & Applications, 2023, 32 (5): 172-179.
[7] ZHU Z, LIANG D, ZHANG S, et al. Traffic-sign detection and classification in the wild[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2110-2118.
[8] LI C, ZHOU A, YAO A. Omni-dimensional dynamic convolution[J]. arXiv:2209.07947, 2022.
[9] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[10] CHEN Y, DAI X, LIU M, et al. Dynamic convolution: attention over convolution kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11030-11039.
[11] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[12] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[13] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464-7475.
[14] SHI W, CABALLERO J, HUSZáR F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 1874-1883.
[15] JIANG B, LUO R, MAO J, et al. Acquisition of localization confidence for accurate object detection[C]//Proceedings of the 2018 European Conference on Computer Vision (ECCV), 2018: 784-799.
[16] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 12993-13000.
[17] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[18] LIU W. ANGUELOV D, ERHAN D, et al. SSD: single shot multi-box detector[C]//Proceedings of the European Conference on Computer Vision. Amsterdam, The Netherland: Springer, 2016: 21-37.
[19] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[20] REDMON J, FARHADI A. YOLOv3: an incremental improve-ment[J]. arXiv:1804.02767, 2018.
[21] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[22] GE Z, LIU S, WANG F, et al. YOLOx: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[23] CHOLLET F. Xception：deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017: 1251-1258.
[24] YANG B, BENDER G, LE Q V, et al. CondConv: conditionally parameterized convolutions for efficient inference[C]// Advances in Neural Information Processing Systems, 2019.
[25] DING X, ZHANG X, MA N, et al. RepVGG: making VGG-style convnets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13733-13742.
[26] WANG W, DAI J, CHEN Z, et al. InternImage: exploring large-scale vision foundation models with deformable convolutions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 14408-14419.