Autonomous Driving Multi-Task Perception Algorithm Based on Receptive-Field Attention Convolution

doi:10.3778/j.issn.1002-8331.2312-0110

Abstract

Abstract: The critical components of autonomous driving perception, including drivable area segmentation, lane detection, and traffic target detection, are executed concurrently, imposing substantial computational demands on intelligent vehicles. A balance between accuracy and speed in practical applications is achieved through the utilization of multi-task perception algorithms. Difficulties inherent in multi-task perception algorithms, such as complex road conditions and obscured targets, are addressed by proposing a multi-task perception algorithm based on receptive-field attention convolution (RFAConv) through YOLOP network enhancement. Initially, certain convolutions in the backbone network are substituted with receptive-field attention convolutions, dynamically allocating convolution kernel weights based on the importance of image features within the receptive field to enhance the network’s feature extraction capability. Subsequently, the feature pyramid network is reconstructed by replacing the original cross-stage hierarchical module with an efficient cross-scale fusion module to fully retain effective information during feature fusion. Additionally, a content-aware feature recombination module is employed as an up-sampling method to mitigate information loss during feature fusion upsampling. Finally, the MPDIoU function is utilized to compute the regression loss, addressing issues related to differently sized but proportionate actual and predicted boxes, further enhancing the detection capability for traffic targets. Testing results on the BDD100K dataset demonstrate that the model, compared to other multi-task models and even single-task models, exhibits superior detection accuracy for drivable area segmentation, lane detection, and traffic target detection while concurrently maintaining real-time inference performance of the network.

Key words: multi-task perception, autonomous driving, object detection, semantic segmentation, receptive-field attention convolution (RFAConv)

摘要： 可行驶区域分割、车道线检测及交通目标检测等作为自动驾驶感知的关键部分，并行执行对智能车辆的算力要求较高，多任务感知算法能够实现实际应用中精度与速度的权衡。针对多任务感知算法中路况复杂、目标受遮挡等难点，通过改进YOLOP网络，提出一种基于感受野注意力卷积（RFAConv）的多任务感知算法。将主干网络中的部分卷积替换为感受野注意力卷积，根据感受野中图像特征的重要程度动态分配卷积核权重以提高网络的特征提取能力；重构特征金字塔网络，使用高效跨尺度融合模块替换原有的跨阶段层次模块，充分保留特征融合的有效信息，并使用内容感知特征重组模块作为上采样方法，减少特征融合时上采样的信息丢失；使用MPDIoU函数计算回归损失，解决真实框与预测框之间同比例但不同大小的问题，进一步提高对交通目标的检测能力。在BDD100K数据集上的测试结果表明，该模型在可行驶区域分割、车道线检测及交通目标检测方面检测精度优于其他多任务模型甚至单任务模型，同时保证了网络实时推理性能。

关键词: 多任务感知, 自动驾驶, 目标检测, 语义分割, 感受野注意力卷积（RFAConv）

LIU Yunxiang, MA Haili, ZHU Jianlin, ZHANG Qing, JIN Qi. Autonomous Driving Multi-Task Perception Algorithm Based on Receptive-Field Attention Convolution[J]. Computer Engineering and Applications, 2024, 60(20): 133-141.

刘云翔, 马海力, 朱建林, 张晴, 金婍. 基于感受野注意力卷积的自动驾驶多任务感知算法[J]. 计算机工程与应用, 2024, 60(20): 133-141.

References

[1] FERGUSON D, DARMS M, URMSON C, et al. Detection, prediction, and avoidance of dynamic obstacles in urban environments[C]//Proceedings of the 2008 IEEE Intelligent Vehicles Symposium, 2008: 1149-1154.
[2] QIAN Y, DOLAN J M, YANG M. DLT-Net: joint detection of drivable areas, lane lines, and traffic objects[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(11): 4670-4679.
[3] 金枝, 张倩, 李熙莹. 基于轻量化ConvLSTM的密集道路车辆检测算法[J]. 计算机工程与应用, 2023, 59(8): 89-96.
JIN Z, ZHANG Q, LI X Y. Dense road vehicle detection based on lightweight ConvLSTM[J]. Computer Engineering and Applications, 2023, 59(8): 89-96.
[4] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6230-6239.
[5] 杜黎, 吕毅斌, 武德安, 等. 复杂场景中的快速车道线检测方法[J]. 计算机工程与应用, 2023, 59(13): 178-185.
DU L, LYU Y B, WU D A, et al. Fast lane detection method in complex scenarios[J]. Computer Engineering and Applications, 2023, 59(13): 178-185.
[6] 王越, 曹家乐, 孙学斌, 等. 融合空间语义的自动驾驶视觉联合感知算法[J/OL]. 太原理工大学学报(2023-09-12)[2023-12-02]. http://kns.cnki.net/kcms/detail/14.1220.n.20230911.
1311.002.html.
WANG Y, CAO J L, SUN X B, et al. Spatial semantic fusion network for autonomous driving visual joint perception algorithm[J/OL]. Journal of Taiyuan University of Technology, (2023-09-12)[2023-12-02]. http://kns.cnki.net/kcms/detail/14.
1220.n.20230911.1311.002.html.
[7] TEICHMANN M, WEBER M, ZOELLNER M, et al. MultiNet: real-time joint semantic reasoning for autonomous driving[C]//Proceedings of the 2018 IEEE Intelligent Vehicles Symposium, 2018: 1013-1020.
[8] CHEN G, WU T, DUAN J, et al. CenterPNets: a multi-task shared network for traffic perception[J]. Sensors, 2023, 23(5): 2467.
[9] WU D, LIAO M W, ZHANG W T, et al. YOLOP: you only look once for panoptic driving perception[J]. Machine Intelligence Research, 2022, 19: 550-562.
[10] ZHANG X, LIU C, YANG D, et al. RFAConv: innovating spatial attention and standard convolutional operation[J]. arXiv:2304.03198, 2023.
[11] SOLAWETZ J, FRANCESCO. What is YOLOv8? the ultimate guide[EB/OL]. (2023-01-11)[2023-06-02]. https://blog.roboflow.com/whats-new-in-yolov8/.
[12] WANG J, CHEN K, XU R, et al. Carafe: content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3007-3016.
[13] 麻斯亮, 许勇. 最小点距离的边界框回归损失函数及其应用[J/OL]. 小型微型计算机系统(2024-09-11)[2024-01-31]. http://kns.cnki.net/kcms/detail/21.1106.TP.20231103.1816.
008.html.
MA S L, XU Y. Bounding box regression loss function based on minimum point distance and its application[J/OL]. Journal of Chinese Mini-Micro Computer Systems(2024-09-11)[2024-01-31]. https://kns.cnki.net/kcms/deail/21.1106.TP.20231103.1816.008.html.
[14] YU F, XIAN W, CHEN Y, et al. BDD100K: a diverse driving dataset for heterogeneous multitask learning[J]. arXiv:1805.04687, 2018.
[15] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1571-1580.
[16] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(9): 1904-1916.
[17] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.
[18] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768.
[19] 李文举, 于杰, 沙利业, 等. 基于全维动态卷积的交通标志识别[J]. 计算机工程与应用, 2024, 60(18): 316-323.
LI W X, YU J, SHA L Y, et al. Traffic sign recognition based on omni-dimensional dynamic convolution[J]. Computer Engineering and Applications, 2024, 60(18): 316-323.
[20] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[21] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 12993-13000.
[22] CHOLLET F. Xception: deep learning with depth wise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1800-1807.
[23] LIU R, LEHMAN J, MOLINO P, et al. An intriguing failing of convolutional neural networks and the CoordConv solution[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018: 9628-9639.
[24] TAN H, DONG S. Pixel-level concrete crack segmentation using pyramidal residual network with omni-dimensional dynamic convolution[J]. Processes, 2023, 11(2): 546.
[25] 牛国臣, 王晓楠. 基于交叉注意力的多任务交通场景检测模型[J]. 北京航空航天大学学报, 2024, 50(5): 1491-1499.
NIU G C, WANG X N. A multi-task traffic scene detection model based on cross-attention[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(5): 1491-1499.
[26] DONG C. Image semantic segmentation using improved ENet network[J]. Journal of Information Processing Systems, 2021, 17: 892-904.
[27] PAN X, SHI J, LUO P, et al. Spatial as deep: spatial CNN for traffic scene understanding[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[28] HOU Y, MA Z, LIU C, et al. Learning lightweight lane detection CNNs by self attention distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1013-1021.
[29] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[30] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.