计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (23): 297-304.DOI: 10.3778/j.issn.1002-8331.2409-0082

• 工程与应用 • 上一篇    下一篇

基于2D卷积神经网络的3D点云物体检测

李晓丽,王乐,杜振龙,陈东   

  1. 南京工业大学 计算机与信息工程学院,南京 211816
  • 出版日期:2025-12-01 发布日期:2025-12-01

Object Detection Within 3D Point Cloud via 2D Convolution Neural Network

LI Xiaoli, WANG Le, DU Zhenlong, CHEN Dong   

  1. School of Computer and Information Engineering, Nanjing Tech University, Nanjing 211816, China
  • Online:2025-12-01 Published:2025-12-01

摘要: 激光雷达在自动驾驶和工业自动化领域已得到初步应用,获取了大量的场景、物体等点云数据,这些点云数据具有维度高、不规则的特性,已有的深度学习网络模型在处理这些数据时需用到计算代价高昂的三维卷积,其时空复杂度高且不能在线应用。针对传统网络模型处理点云数据的缺陷,提出一种基于2D卷积神经网络的3D点云物体识别方法,所提方法把不规则的点云数据统计规整为点云柱,用卷积、池化提取点云柱簇的特征,将三维的点云数据编码转化为二维的类图像特征数据;使用包含注意力机制的二维卷积神经网络在多个感受野提取充分表示点云的多尺度隐特征,解码网络根据位置、方向及物体种类识别点云物体。实验基于Ascend Atlas 200DK边端设备,单次推理耗时291?ms,实验结果与传统点云目标检测网络进行比较,分别以14.7、13.2、3.4倍的性能提升优于VoxelNet、F-PoitnNet以及Second网络模型;在KITTI数据集与ContFuse等14种点云目标检测算法进行精度对比,与次优算法相比,平均精度提升在2.3%以上;设计针对二维卷积以及注意力机制的消融实验,两个模块在模型大小与推理精度上分别提升50.9%和5.37%。实验结果表明,所提方法可高效、鲁棒、准确地检测3D点云数据的目标物体。

关键词: 3D点云, 点云物体识别, 深度学习, 点云柱, 类图像

Abstract: Lidar has been initially applied in autonomous driving and industrial automation, generating vast amounts of point cloud data for scenes and objects. These point cloud data are characterized by high dimensionality and irregularity, and require computationally expensive 3D convolution in existing deep learning models, leading to high spatio-temporal complexity and hindering online application. Addressing the limitations of traditional network models in processing point cloud data, this paper proposes a 3D point cloud object recognition method based on 2D convolutional neural networks. The proposed method statistically regularizes irregular point cloud data into pillars, utilizes convolutions and pooling to extract features from clusters of pillars, converts the 3D point cloud data into 2D image-like features, and employs 2D convolutional neural networks to extract multi-scale latent features from multiple receptive fields. The decoder network then identifies objects within point cloud based on locations, orientations, and object types. Experiments are conducted on Ascend Atlas 200DK edge devices, achieving a single inference time of 291?ms. Compared with traditional point cloud object detection networks, the proposed method outperforms VoxelNet, F-PointNet, and Second by 14.7, 13.2, and 3.4 times, respectively, in terms of performance gains. On the KITTI dataset, the average precision exceeds that of the second-best algorithm by more than 2.3%  compared with 14 other point cloud object detection algorithms, including ContFuse. Ablation studies focusing on 2D convolutions and attention mechanisms reveal improvements of 50.9% and 5.37%, respectively, in model size and inference accuracy. The experimental results demonstrate that the proposed method can efficiently, robustly, and accurately detect objects within point cloud data.

Key words: 3D point cloud, point cloud object recognition, deep learning, pillar, pseudo image