计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (21): 214-224.DOI: 10.3778/j.issn.1002-8331.2407-0554

• 模式识别与人工智能 • 上一篇    下一篇

基于多特征交互融合的行人过街意图预测

杨智勇,郭洁铷,郭子杭,许沁欣   

  1. 1.重庆师范大学 计算机与信息科学学院,重庆 401331 
    2.重庆工程职业技术学院 大数据与物联网学院,重庆 402260
  • 出版日期:2025-11-01 发布日期:2025-10-31

Pedestrian Crossing Intention Prediction Based on Multi-Feature Interaction Fusion

YANG Zhiyong, GUO Jieru, GUO Zihang, XU Qinxin   

  1. 1.School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
    2.School of Big Data and Internet of Things,Chongqing Vocational Institute of Engineering, Chongqing 402260, China
  • Online:2025-11-01 Published:2025-10-31

摘要: 行人意图预测对于开发安全的自动驾驶辅助系统至关重要。传统方法主要采用图卷积网络和递归架构来处理人体姿态数据,这些方法在特征融合方面存在局限性,并且依赖于行人姿态信息提取的完整性,导致行人被局部遮挡时准确性降低。为了解决以上问题,提出了多特征交互融合的行人过街意图预测模型(PEPR-Net),使用头部姿态并引入了骨架热力图信息,提高了行人被遮挡时预测的准确性,并且弥补了非欧几里得骨骼点信息与其他特征之间的互补差距。在此基础上,提出了一个多特征交互混合融合模块,用级联交叉注意力融合方法处理像素信息,级联混合融合结构处理非像素信息,形成更全面的特征表示。引入一种新的非对称双向门控循环模块(UBA-GRU)进行特征融合,采用最优融合策略实现F1分数和准确率(ACC)的最佳预测性能。在PIE数据集上进行了大量的消融实验,性能分析表明,PEPR-Net的准确率达到91%。该研究结果有望为自动驾驶系统提供更准确的行人意图预测。

关键词: 自动驾驶辅助系统, 多特征交互融合, 意图预测, 注意力机制, 骨架热力图

Abstract: Pedestrian intention prediction is very important for the development of safe advanced driver assistance systems. Traditional methods mainly use graph convolutional network and recursive architecture to process human pose data. These methods have limitations in feature fusion and rely on the integrity of pedestrian pose information extraction, resulting in lower accuracy when pedestrians are obscured. In order to solve the above problems, a multi-feature interactive fusion pedestrian crossing intention prediction module (PEPR-Net) is proposed, which uses head pose and introduces skeleton heat map information to improve the accuracy of prediction when pedestrians are blocked, and bridge the complementary gap between non-Euclidean bone point information and other features. On this basis, a multi-feature interactive hybrid fusion module is proposed, which uses cascade cross attention fusion method to process pixel information and cascade hybrid fusion structure to process non-pixel information. Finally, a new asymmetric bidirectional gated cycle module (UBA-GRU) is introduced for feature fusion, and the optimal fusion strategy is used to achieve the best predictive performance of F1 score and accuracy (ACC). A large number of ablation experiments are performed on the PIE dataset, and performance analysis shows that PEPR-Net achieves 91% accuracy. The results are expected to provide more accurate pedestrian intent predictions for autonomous driving systems.

Key words: autonomous driving assistance system, multi-feature interactive fusion, pedestrian intention, attention mechanism, skeleton heat map