计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (17): 107-116.DOI: 10.3778/j.issn.1002-8331.2306-0159

• 模式识别与人工智能 • 上一篇    下一篇

自动驾驶场景下的行人意图语义VSLAM

罗朝阳,张荣芬,刘宇红,李金,范润泽   

  1. 贵州大学 大数据与信息工程学院,贵阳 550025
  • 出版日期:2024-09-01 发布日期:2024-08-30

Pedestrian Intent Semantic VSLAM in Automatic Driving Scenarios

LUO Zhaoyang, ZHANG Rongfen, LIU Yuhong, LI Jin, FAN Runze   

  1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
  • Online:2024-09-01 Published:2024-08-30

摘要: 视觉同步定位与建图(visual simultaneous localization and mapping,VSLAM)在自动驾驶领域有广泛的应用,但传统的算法缺乏语义信息,并且不能推理和预测场景中行人的行为或意图。提出了一种有效的语义VSLAM方法,使用基于DPT(dense prediction transformer)的语义分割算法获取潜在动态目标的分割掩码进行动态特征剔除,由于在自动驾驶场景下的动态物体绝大多数为行人和车辆,为了完成潜在动态目标中静态点的重添加及动态物体的再检测,使用几何约束联合行人意图预测共同优化相机位姿,为了对行人是否过马路进行准确的意图预测,利用人体骨架信息构建双流、时空自适应图卷积神经网络预测行人过街意图。在KITTI数据集下验证的结果表明,所提方法相较于ORB-SLAM3算法的绝对轨迹估计误差有一定减少,且精度优于同类型的算法,有望为自动驾驶系统提供更丰富的语义信息,更好地完成自动驾驶任务。

关键词: 自动驾驶, 语义分割, 相机位姿优化, 行人意图预测

Abstract: Visual simultaneous localization and mapping (VSLAM) has found extensive applications in the field of autonomous driving. However, conventional algorithms lack semantic information and are incapable of inferring or predicting pedestrians’ behaviors or intentions within a scene. This paper introduces an effective semantic VSLAM method that employs a semantic segmentation algorithm based on dense prediction transformer (DPT) to acquire segmentation masks for potential dynamic targets, enabling dynamic feature removal. Given that the majority of dynamic objects in autonomous driving scenarios are pedestrians and vehicles, in order to both reintegrate static points from potential dynamic targets and re-detect dynamic objects, a geometric constraint is employed to jointly optimize camera poses while predicting pedestrian intentions. To accurately forecast whether pedestrians are crossing the road, a dual-stream, spatiotemporal adaptive graph convolutional neural network is built using human skeletal information to predict pedestrian jaywalking intentions. The results validated on the KITTI dataset indicate that the proposed approach, in comparison to the ORB-SLAM3 algorithm, has a certain reduction in absolute trajectory estimation errors, demonstrating superior precision compared to algorithms of similar nature. This method holds the potential to furnish autonomous driving systems with richer semantic information, thereby enhancing the accomplishment of autonomous driving tasks.

Key words: autonomous driving, semantic segmentation, camera pose optimization, pedestrian intention prediction