计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (20): 133-141.DOI: 10.3778/j.issn.1002-8331.2312-0110

• 模式识别与人工智能 • 上一篇    下一篇

基于感受野注意力卷积的自动驾驶多任务感知算法

刘云翔,马海力,朱建林,张晴,金婍   

  1. 上海应用技术大学 计算机科学与信息工程学院,上海 201418
  • 出版日期:2024-10-15 发布日期:2024-10-15

Autonomous Driving Multi-Task Perception Algorithm Based on Receptive-Field Attention Convolution

LIU Yunxiang, MA Haili, ZHU Jianlin, ZHANG Qing, JIN Qi   

  1. School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China
  • Online:2024-10-15 Published:2024-10-15

摘要: 可行驶区域分割、车道线检测及交通目标检测等作为自动驾驶感知的关键部分,并行执行对智能车辆的算力要求较高,多任务感知算法能够实现实际应用中精度与速度的权衡。针对多任务感知算法中路况复杂、目标受遮挡等难点,通过改进YOLOP网络,提出一种基于感受野注意力卷积(RFAConv)的多任务感知算法。将主干网络中的部分卷积替换为感受野注意力卷积,根据感受野中图像特征的重要程度动态分配卷积核权重以提高网络的特征提取能力;重构特征金字塔网络,使用高效跨尺度融合模块替换原有的跨阶段层次模块,充分保留特征融合的有效信息,并使用内容感知特征重组模块作为上采样方法,减少特征融合时上采样的信息丢失;使用MPDIoU函数计算回归损失,解决真实框与预测框之间同比例但不同大小的问题,进一步提高对交通目标的检测能力。在BDD100K数据集上的测试结果表明,该模型在可行驶区域分割、车道线检测及交通目标检测方面检测精度优于其他多任务模型甚至单任务模型,同时保证了网络实时推理性能。

关键词: 多任务感知, 自动驾驶, 目标检测, 语义分割, 感受野注意力卷积(RFAConv)

Abstract: The critical components of autonomous driving perception, including drivable area segmentation, lane detection, and traffic target detection, are executed concurrently, imposing substantial computational demands on intelligent vehicles. A balance between accuracy and speed in practical applications is achieved through the utilization of multi-task perception algorithms. Difficulties inherent in multi-task perception algorithms, such as complex road conditions and obscured targets, are addressed by proposing a multi-task perception algorithm based on receptive-field attention convolution (RFAConv) through YOLOP network enhancement. Initially, certain convolutions in the backbone network are substituted with receptive-field attention convolutions, dynamically allocating convolution kernel weights based on the importance of image features within the receptive field to enhance the network’s feature extraction capability. Subsequently, the feature pyramid network is reconstructed by replacing the original cross-stage hierarchical module with an efficient cross-scale fusion module to fully retain effective information during feature fusion. Additionally, a content-aware feature recombination module is employed as an up-sampling method to mitigate information loss during feature fusion upsampling. Finally, the MPDIoU function is utilized to compute the regression loss, addressing issues related to differently sized but proportionate actual and predicted boxes, further enhancing the detection capability for traffic targets. Testing results on the BDD100K dataset demonstrate that the model, compared to other multi-task models and even single-task models, exhibits superior detection accuracy for drivable area segmentation, lane detection, and traffic target detection while concurrently maintaining real-time inference performance of the network.

Key words: multi-task perception, autonomous driving, object detection, semantic segmentation, receptive-field attention convolution (RFAConv)