Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (13): 113-123.DOI: 10.3778/j.issn.1002-8331.2309-0217

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Research on 3D Object Detection Method Based on Multi-Modal Fusion

TIAN Feng, ZONG Neili, LIU Fang, LU Yuanyuan, LIU Chao, JIANG Wenwen, ZHAO Ling, HAN Yuxiang   

  1. School of Computer and Information Technology, Northeast Petroleum University, Daqing, Heilongjiang 163318, China
  • Online:2024-07-01 Published:2024-07-01

多模态融合的三维目标检测方法研究

田枫,宗内丽,刘芳,卢圆圆,刘超,姜文文,赵玲,韩玉祥   

  1. 东北石油大学 计算机与信息技术学院,黑龙江 大庆 163318

Abstract: Aiming at the problem that the detection algorithm based on pure point cloud is prone to miss detection and false detection of far-small targets due to the sparsity and disorder of point cloud, a multi-modal 3D object detection algorithm combining image features and point cloud voxel features is proposed. In the stage of image feature extraction, a lightweight deep residual network is proposed to reduce the number of image feature channels and make it consistent with the point cloud voxel features, so as to improve the fusion ability of point cloud and image features. In the fusion stage of voxel features and image features, a double feature fusion network is proposed. On the basis of retaining the original voxel feature structure information, the image features and voxel features are fused to make the point cloud have rich semantic information, so as to improve the detection accuracy of far-small targets. The experimental results on the KITTI dataset show that compared with the baseline model, the 3D average detection accuracy of car, cyclist and pedestrian is improved by 0.76 percentage points, 2.30 percentage points and 3.43 percentage points, respectively. The experimental results verify the effectiveness of the proposed method for solving the problem of false detection and missed detection of far-small targets.

Key words: 3D object detection, deep residual network, voxel features, image features, feature fusion, double feature fusion network

摘要: 针对点云稀疏性与无序性导致基于纯点云的检测算法容易出现远小目标漏检和误检的问题,提出一种融合图像特征与点云体素特征的多模态三维目标检测算法。在图像特征提取阶段,提出一种轻量级深度残差网络,减少图像特征通道数,使其与点云体素特征相一致,提高点云和图像特征的融合能力;在体素特征与图像特征融合阶段,提出一种双次特征融合网络,在保留原始体素特征结构信息的基础上将图像特征和体素特征进行融合,使点云具备丰富的语义信息,提高远小目标检测精度。在KITTI数据集上实验结果显示,与基线模型相比,对小汽车、骑行者与行人的3D平均检测精度分别提高了0.76个百分点、2.30个百分点、3.43个百分点。实验结果验证了所提方法对于解决远小目标误检和漏检问题的有效性。

关键词: 三维目标检测, 深度残差网络, 体素特征, 图像特征, 特征融合, 双次特征融合网络