计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (13): 217-226.DOI: 10.3778/j.issn.1002-8331.2404-0090

• 模式识别与人工智能 • 上一篇    下一篇

基于时空特征融合与候选策略的智能汽车多模态轨迹预测

杨智勇,杨俊,许沁欣   

  1. 1.重庆师范大学 计算机与信息科学学院,重庆 401331 
    2.重庆工程职业技术学院 大数据与物联网学院,重庆 402260
  • 出版日期:2025-07-01 发布日期:2025-06-30

Multimodal Trajectory Prediction for Intelligent Vehicles Based on Spatio-Temporal Feature Fusion and Candidate Strategies

YANG Zhiyong, YANG Jun, XU Qinxin   

  1. 1.School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
    2.School of Big Data and Internet of Things, Chongqing Vocational Institute of Engineering, Chongqing 402260, China
  • Online:2025-07-01 Published:2025-06-30

摘要: 针对现有轨迹预测模型在捕捉复杂时空动态方面的局限性,以及部分预测轨迹不符合实际场景约束等问题,提出了一种基于时空特征融合和候选策略的智能汽车多模态轨迹预测模型。在场景编码和特征融合阶段,设计了非对称双向门控循环单元以捕获历史轨迹序列之间的双向依赖性;引入一种基于交叉注意力的混合特征注意力方法,以建模车道与交通参与者间的隐式交互,并在车道图节点中深度融合车道空间特征和轨迹的时序特征。在解码器前引入直接使用车道拓扑结构的候选策略,该策略将利用先验知识指导预测过程,并通过覆盖目标车辆可能的未来轨迹,确保解码器能够输出可靠的多模态轨迹。该模型在公开数据集nuScenes上进行验证,实验结果表明,在预测5条和10条轨迹时,minADE和MR分别较最佳对比模型提高了7.5%、11.5%和5.5%、21.4%。可视化结果展现出更强的稳健性和解释性。

关键词: 智能驾驶, 轨迹预测, 时空特征融合, 注意力机制, 多模态预测

Abstract: To address the limitations of existing trajectory prediction models in capturing complex spatiotemporal dynamics and predicting some trajectories that meet real-world constraints, this paper proposes a multimodal trajectory prediction model for intelligent vehicles based on spatiotemporal feature fusion and candidate strategies. First, an asymmetric bidirectional gated recurrent unit is designed in the scene encoding and feature fusion stage to capture bidirectional dependencies within historical trajectory sequences. Next, a hybrid feature attention method based on cross-attention is introduced to model implicit interactions between lanes and traffic participants, deeply integrating lane spatial features and trajectory temporal features in lane graph nodes. Finally, a candidate strategy directly utilizing lane topology is introduced before the decoder, guiding the prediction process with prior knowledge and covering potential future trajectories of the target vehicle, ensuring the decoder outputs reliable multimodal trajectories. The model is validated on the public nuScenes dataset. Experimental results show that for predicting 5 and 10 trajectories, the minADE and MR improved by 7.5%, 11.5%, and 5.5%, 21.4% respectively, compared to the best benchmark models. Visualization results demonstrate the model’s enhanced robustness and interpretability.

Key words: intelligent driving, trajectory prediction, spatio-temporal feature fusion, attention mechanisms, multimodal prediction