计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (23): 126-134.DOI: 10.3778/j.issn.1002-8331.2408-0410

• 模式识别与人工智能 • 上一篇    下一篇

融合双序列姿态的驾驶员行为识别方法

谭大艺,田炜,熊璐   

  1. 同济大学 汽车学院,上海 201804
  • 出版日期:2025-12-01 发布日期:2025-12-01

Driver Behavior Recognition Method Using Dual-Sequence Pose Integration

TAN Dayi, TIAN Wei, XIONG Lu   

  1.  School of Automotive Studies, Tongji University, Shanghai 201804, China
  • Online:2025-12-01 Published:2025-12-01

摘要: 识别危险驾驶行为模式可以提高驾驶安全,是自动驾驶技术重要研究内容。目前,基于图像的驾驶员行为识别方法存在计算量大、信息冗余等问题,由此提出融合双序列姿态的驾驶员行为识别方法SimPoseConv3D。基于人体姿态序列估计模块SimCC从视频中提取驾驶员姿态热图序列,在时间维度上进行堆叠、裁剪和采样,将热图体积按时间维度进行正向、逆向融合,输入至3D CNN中提取动作时空特征进行驾驶行为识别。在Drive&Act数据集中对提出方法进行训练测试并开展消融实验,结果表明在Task-level(整体行为)和Mid-level(细粒度行为)测试集上的识别精度分别达到70.25%和79.04%,相比当前公开最佳方法分别提升6.07和4.13个百分点,且采用SimCC作为姿态估计器比传统姿态估计器的计算效率提升18.51%。

关键词: 驾驶员行为识别, 人体姿态估计, 双向姿态热图序列

Abstract: Identifying dangerous driving behavior patterns can enhance driving safety and is a crucial aspect of autonomous driving technology research. Currently, image-based driver behavior recognition methods face challenges such as high computational costs and information redundancy. To address these issues, a novel driver behavior recognition method called SimPoseConv3D is proposed, which integrates dual-sequence posture information. Firstly, the SimCC module extracts driver pose heatmap sequences from video. These heatmaps are then stacked, cropped, and sampled along the temporal dimension. Subsequently, the heatmap volumes are fused in both forward and backward directions along the time axis before being input into a 3D CNN to extract spatiotemporal features for behavior recognition. Training and testing on the Drive&Act dataset, along with ablation experiments, show that the proposed method achieves recognition accuracies of 70.25% and 79.04% on Task-level (overall behavior) and Mid-level (fine-grained behavior) test sets, respectively, representing improvements of 6.07 and 4.13 percentage points over the current best public methods. Additionally, using SimCC as the pose estimator enhances computational efficiency by 18.51% compared to traditional pose estimators.

Key words: driver behavior recognition, human pose estimation, bi-directional pose heatmap sequences