计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (8): 136-146.DOI: 10.3778/j.issn.1002-8331.2109-0489

• 模式识别与人工智能 • 上一篇    下一篇

多尺度Transformer激光雷达点云3D物体检测

孙刘杰,赵进,王文举,张煜森   

  1. 上海理工大学 出版印刷与艺术设计学院,上海 200093
  • 出版日期:2022-04-15 发布日期:2022-04-15

Multi-Scale Transformer Lidar Point Cloud 3D Object Detection

SUN Liujie, ZHAO Jin, WANG Wenju, ZHANG Yusen   

  1. College of Communication and Art Design, Shanghai University of Science and Technology, Shanghai 200093, China
  • Online:2022-04-15 Published:2022-04-15

摘要: 激光雷达点云3D物体检测,对于小物体如行人、自行车的检测精度较低,容易漏检误检,提出一种多尺度Transformer激光雷达点云3D物体检测方法MSPT-RCNN(multi-scale point transformer-RCNN),提高点云3D物体检测精度。该方法包含两个阶段,即第一阶段(RPN)和第二阶段(RCNN)。RPN阶段通过多尺度Transformer网络提取点云特征,该网络包含多尺度邻域嵌入模块和跳跃连接偏移注意力模块,获取多尺度邻域几何信息和不同层次全局语义信息,生成高质量初始3D包围盒;在RCNN阶段,引入包围盒内的点云多尺度邻域几何信息,优化了包围盒位置、尺寸、朝向和置信度等信息。实验结果表明,该方法(MSPT-RCNN)具有较高检测精度,特别是对于远处和较小物体,提升更高。MSPT-RCNN通过有效学习点云数据中的多尺度几何信息,提取不同层次有效的语义信息,能够有效提升3D物体检测精度。

关键词: Transformer, 多尺度, 偏移注意力, 点云, 3D物体检测

Abstract: Point cloud 3D object detection has low detection accuracy for small objects such as pedestrians and bicycles, which is easy to miss detection and false detection. A 3D object detection method MSPT-RCNN(multi-scale point transformer-RCNN) based on multi-scale point cloud transformer is proposed to improve the detection accuracy of point cloud 3D objects. The method consists of two stages, the first stage(RPN) and the second stage(RCNN). In RPN stage, point cloud features are extracted through multi-scale transformer network, which includes multi-scale neighborhood embedding module and jump connection offset attention module to obtain multi-scale neighborhood geometric information and different levels of global semantic information, and generate high-quality initial 3D bounding box. In the RCNN stage, the multi-scale neighborhood geometric information of point cloud in the bounding box is introduced to optimize the position, size, orientation and confidence of the bounding box. The experimental results show that this method(MSPT-RCNN) has high detection accuracy, especially for distant and small objects. MSPT-RCNN can effectively improve the accuracy of 3D object detection by effectively learning the multi-scale geometric information in point cloud data and extracting different levels of effective semantic information.

Key words: Transformer, multi-scale, offset-attention, point cloud, 3D object detection