计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (7): 278-287.DOI: 10.3778/j.issn.1002-8331.2311-0280

• 图形图像处理 • 上一篇    下一篇

通信延迟下车辆协同感知的3D目标检测方法

卢敏,胡振宇   

  1. 1.中国民航大学 计算机科学与技术学院,天津 300300
    2.民航智慧机场理论与系统重点实验室,天津 300300
  • 出版日期:2025-04-01 发布日期:2025-04-01

3D Object Detection Method for Cooperative Vehicle Sensing Under Communication Delay

LU Min, HU Zhenyu   

  1. 1. School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
    2. Key Laboratory of Smart Airport Theory and System, Tianjin 300300, China
  • Online:2025-04-01 Published:2025-04-01

摘要: 针对车辆协同感知3D目标检测在通信延迟条件下精度较低的问题,提出一种通信延迟下车辆协同感知的3D目标检测方法。设计时空预测模块,提取通信延迟车辆历史感知特征序列中的时空特征,以预测当前时刻的感知特征,基于预测特征构建感知融合模块,利用注意力机制动态融合感知特征,以降低预测误差影响,提高检测精度。该方法在OPV2V、V2XSet和V2V4Real数据集上进行实验,与Where2Comm、V2VNet等主流协同感知方法相比。实验结果表明,在所对比的方法中,Where2Comm在不同延迟下的3D目标检测平均精度最优,该方法相比Where2Comm在400 ms下的平均精度分别提高了5.9、3.9和1.5个百分点。

关键词: 协同感知, 3D目标检测, 通信延迟, 时空序列预测, 注意力机制, 特征融合

Abstract: Aiming at the problem of low accuracy of 3D target detection for cooperative vehicle perception under communication delay conditions, a 3D target detection method for cooperative vehicle perception under communication delay is proposed. Firstly, a spatio-temporal prediction module is designed to extract the spatio-temporal features in the sequence of historical perception features of communication delayed vehicles in order to predict the perception features at the current moment, and then a perception fusion module is constructed based on the predicted features, which dynamically fuses the perception features using the attention mechanism to reduce the impact of prediction error and improve the detection accuracy. The method in this paper is experimented on OPV2V, V2XSet and V2V4Real datasets, compared with mainstream cooperative perception methods such as Where2Comm and V2VNet. The experimental results show that Where2Comm has the best average accuracy for 3D target detection at different delays among the compared methods, and the method improves the average accuracy by 5.9, 3.9 and 1.5 percentage points compared to Where2Comm at 400 ms, respectively.

Key words: collaborative perception, 3D object detection, communication delay, spatio-temporal series prediction, attention mechanism, feature fusion