Research on 3D Object Detection Method Based on Multi-Modal Fusion

doi:10.3778/j.issn.1002-8331.2309-0217

Abstract

Abstract: Aiming at the problem that the detection algorithm based on pure point cloud is prone to miss detection and false detection of far-small targets due to the sparsity and disorder of point cloud, a multi-modal 3D object detection algorithm combining image features and point cloud voxel features is proposed. In the stage of image feature extraction, a lightweight deep residual network is proposed to reduce the number of image feature channels and make it consistent with the point cloud voxel features, so as to improve the fusion ability of point cloud and image features. In the fusion stage of voxel features and image features, a double feature fusion network is proposed. On the basis of retaining the original voxel feature structure information, the image features and voxel features are fused to make the point cloud have rich semantic information, so as to improve the detection accuracy of far-small targets. The experimental results on the KITTI dataset show that compared with the baseline model, the 3D average detection accuracy of car, cyclist and pedestrian is improved by 0.76 percentage points, 2.30 percentage points and 3.43 percentage points, respectively. The experimental results verify the effectiveness of the proposed method for solving the problem of false detection and missed detection of far-small targets.

Key words: 3D object detection, deep residual network, voxel features, image features, feature fusion, double feature fusion network

摘要： 针对点云稀疏性与无序性导致基于纯点云的检测算法容易出现远小目标漏检和误检的问题，提出一种融合图像特征与点云体素特征的多模态三维目标检测算法。在图像特征提取阶段，提出一种轻量级深度残差网络，减少图像特征通道数，使其与点云体素特征相一致，提高点云和图像特征的融合能力；在体素特征与图像特征融合阶段，提出一种双次特征融合网络，在保留原始体素特征结构信息的基础上将图像特征和体素特征进行融合，使点云具备丰富的语义信息，提高远小目标检测精度。在KITTI数据集上实验结果显示，与基线模型相比，对小汽车、骑行者与行人的3D平均检测精度分别提高了0.76个百分点、2.30个百分点、3.43个百分点。实验结果验证了所提方法对于解决远小目标误检和漏检问题的有效性。

关键词: 三维目标检测, 深度残差网络, 体素特征, 图像特征, 特征融合, 双次特征融合网络

TIAN Feng, ZONG Neili, LIU Fang, LU Yuanyuan, LIU Chao, JIANG Wenwen, ZHAO Ling, HAN Yuxiang. Research on 3D Object Detection Method Based on Multi-Modal Fusion[J]. Computer Engineering and Applications, 2024, 60(13): 113-123.

田枫, 宗内丽, 刘芳, 卢圆圆, 刘超, 姜文文, 赵玲, 韩玉祥. 多模态融合的三维目标检测方法研究[J]. 计算机工程与应用, 2024, 60(13): 113-123.

References

[1] 刘博, 于洋, 姜朔. 激光雷达探测及三维成像研究进展[J]. 光电工程, 2019, 46(7): 15-27.
LIU B, YU Y, JIANG S. Review of advances in LiDAR detection and 3D imaging[J]. Opto-Electronic Engineering, 2019, 46(7): 15-27.
[2] 李长乐, 王硕, 岳文伟, 等. 面向空地一体化交通的虚拟车道: 发展阶段与关键技术[J]. 电子学报, 2022, 50(5): 1255-1265.
LI C L, WANG S, YUE W W, et al. Virtual lanes for air-ground integrated transportation systems: evolution and key techniques[J]. Acta Electronica Sinica, 2022, 50(5): 1255-1265.
[3] 王耀南, 江一鸣, 姜娇, 等. 机器人感知与控制关键技术及其智能制造应用[J]. 自动化学报, 2023, 49(3): 494-513.
WANG Y N, JIANG Y M, JIANG J, et al. Key technologies of robot perception and control and its intelligent manufacturing applications[J]. Acta Automatica Sinica, 2023, 49(3): 494-513.
[4] 何晖光, 田捷, 赵明昌, 等. 基于分割的三维医学图像表面重建算法[J]. 软件学报, 2002(2): 219-226.
HE H G, TIAN J, ZHAO M C, et al. A 3D medical image surface reconstruction scheme based on segmentation[J]. Journal of Software, 2002(2): 219-226.
[5] 李佳男, 王泽, 许廷发. 基于点云数据的三维目标检测技术研究进展[J]. 光学学报, 2023, 43(15): 296-312.
LI J N, WANG Z, XU T F. Three-dimensional object detection technology based on point cloud data[J]. Acta Optica Sinica, 2023, 43(15): 296-312.
[6] 张冬冬, 郭杰, 陈阳. 基于原始点云的三维目标检测算法[J]. 计算机工程与应用, 2023, 59(3): 209-217.
ZHANG D D, GUO J, CHEN Y. 3D object detection algorithm based on raw point clouds[J]. Computer Engineering and Applications, 2023, 59(3) : 209-217.
[7] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 652-660.
[8] SHI S, WANG X, LI H. PointRCNN: 3D object proposal generation and detection from point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 770-779.
[9] SHI S, WANG Z, WANG X, et al. Part-A2 Net: 3D part-aware and aggregation neural network for object detection from point cloud[J]. arXiv:1907.03670, 2019.
[10] PAN X, XIA Z, SONG S, et al. 3D object detection with pointformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 7463-7472.
[11] ZHOU Y, TUZEL O. Voxelnet: end-to-end learning for point cloud based 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4490-4499.
[12] YAN Y, MAO Y, LI B. SECOND: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[13] YANG Z, SUN Y, LIU S, et al. STD: sparse-to-dense 3D object detector for point cloud[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1951-1960.
[14] SHI S, GUO C, JIANG L, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 10529-10538.
[15] DENG J, SHI S, LI P, et al. Voxel-RCNN: towards high performance voxel-based 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 1201-1209.
[16] LANG A H, VORA S, CAESAR H, et al. PointPillars: fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 12697-12705.
[17] 黄远宪, 李必军, 黄琦, 等. 融合相机与激光雷达的目标检测、跟踪与预测[J/OL]. 武汉大学学报 (信息科学版): 1-8[2023-09-11]. https://doi.org/10.13203/j.whugis20210614.
HUANG Y X, LI B J, HUANG Q, et al. Camera-lidar fusion for object detection, tracking and prediction[J/OL]. Geomatics and Information Science of Wuhan University: 1-8[2023-09-11]. https://doi.org/10.13203/j.whugis20210614.
[18] VORA S, LANG A H, HELOU B, et al. PointPainting: sequential fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4604-4612.
[19] PANG S, MORRIS D, RADHA H. CLOCs: camera-LiDAR object candidates fusion for 3D object detection[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020: 10386-10393.
[20] CHEN X, MA H, WAN J, et al. Multi-view 3D object detection network for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1907-1915.
[21] KU J, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018: 1-8.
[22] SINDAGI V A, ZHOU Y, TUZEL O. MVX-Net: multimodal VoxelNet for 3D object detection[C]//Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), 2019: 7276-7282.
[23] HUANG T, LIU Z, CHEN X, et al. EPNet: enhancing point features with image semantics for 3D object detection[C]//Proceedings of 16th European Conference on Computer Vision (ECCV), 2020: 35-52.
[24] YOO J H, KIM Y, KIM J, et al. 3D-CVF: generating joint camera and lidar features using cross-view spatial feature fusion for 3D object detection[C]//Proceedings of 16th European Conference on Computer Vision (ECCV), 2020: 720-736.
[25] HE K, SUN J. Convolutional neural networks at constrained time cost[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5353-5360.
[26] GARHAM B. Sparse 3D convolutional neural networks[J]. arXiv:1505.02890, 2015.
[27] GARHAM B, VAN D, MAATEN L. Submanifold sparse convolutional networks[J]. arXiv:1706.01307, 2017.
[28] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.
[29] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving!the kitti vision benchmark suite[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 3354-3361.
[30] 王妍. 基于点云与图像多模态融合的三维目标检测[D]. 广州: 华南理工大学, 2022.
WANG Y. 3D object detection based on multi-modal fusion of point cloud and image[D]. Guangzhou: South China University of Technology, 2022.
[31] 鲁斌, 孙洋, 杨振宇. 融合体素图注意力的三维目标检测算法[J/OL]. 智能系统学报: 1-12[2023-09-26]. http://kns.cnki.net/kcms/detail/23.1538.TP.20230914.0902.002.html.
LU B, SUN Y, YANG Z Y. 3D object detection with voxel graph attention from point cloud[J/OL]. CAAI Transactions on Intelligent Systems: 1-12[2023-09-26]. http://kns.cnki.net/kcms/detail/23.1538.TP.20230914.0902.002.html.
[32] 李海宁. 基于改进点密度感知的三维目标检测方法研究[D]. 西安: 西安理工大学, 2023.
LI H N. Research on 3D object detection method based on improved point density perception[D]. Xi’an: Xi’an University of Technology, 2023.
[33] 宋润泽. 复杂道路场景下多传感器三维目标检测方法研究[D]. 长春: 吉林大学, 2023.
SONG R Z. Research on multi-sensor 3D object detection algorithms in complex road scenarios[D]. Changchun: Jilin University, 2023.
[34] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[35] HOWARD A, SANDLER M, CHU G, et al. Searching for MobileNetV3[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1314-1324.
[36] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[37] 车运龙, 袁亮, 孙丽慧. 基于强语义关键点采样的三维目标检测方法[J]. 计算机工程与应用, 2024, 60(9): 254-260.
CHE Y L, YUAN L, SUN L H. 3D object detection based on strong semantic key point sampling[J]. Computer Engineering and Applications, 2024, 60(9): 254-260.
[38] 胡杰, 安永鹏, 徐文才, 等. 基于激光点云的深度语义和位置信息融合的三维目标检测[J]. 中国激光, 2023, 50(10): 200-210.
HU J, AN Y P, XU W C, et al. 3D object detection based on deep semantics and position information fusion of laser point cloud[J]. Chinese Journal of Lasers, 2023, 50(10): 200-210.