全局形状关系约束的点云三维目标检测方法

doi:10.3778/j.issn.1002-8331.2406-0146

摘要/Abstract

摘要： 基于投票的方法在室内三维目标检测任务中展现出巨大的潜力，其中投票直接决定了检测结果的质量。然而位于物体空间重叠处的种子点容易出现错误投票的问题，即映射到错误目标物体中心附近。鉴于这些种子点在几何表面上通常是连续的，引入形状关系来改善这一问题。具体来说，提出了形状关系提取模块，通过构建二维流形并基于流形上的欧氏距离来表征形状关系，然后通过矩阵乘法实现形状关系对点云的约束。为了获取几何表面连续性信息，设计了二叉树Transformer模块。经过形状关系约束后的点云，通过优化的Transformer网络捕获全局上下文，从而学习到物体的表面结构。采用ScanNet和SUN RGB-D数据集进行对比实验，结果表明文中算法在mAP@0.25指标上分别达到65.1%和62.7%，相较于基线方法分别有6.5和5个百分点的提升，对比目前最优方法分别提高了0.6和1.1个百分点。

关键词: 三维目标检测, 点云, 流形学习, Transformer, 形状关系

Abstract: Voting-based method has shown great potential in indoor 3D object detection tasks, where voting directly determines the quality of the detection results. However, seed points located in overlapping areas of objects are prone to erroneous voting, mapping them near incorrect target object centers. Considering that these seed points are usually continuous on the geometric surface, introducing shape relations can improve this issue. Specifically, a shape relation extraction module is proposed, which constructs a 2D manifold and represents shape relations based on Euclidean distance on the manifold, then implements shape relation constraints on the point cloud through matrix multiplication. To obtain geometric surface continuity information, a binary tree Transformer module is designed. The point cloud constrained by shape relations captures global context through an optimized Transformer network, thus learning the surface structure of objects. Comparative experiments using the ScanNet and SUN RGB-D datasets show that the proposed algorithm achieves mAP@0.25 scores of 65.1% and 62.7%, respectively, improving by 6.5 and 5 percentage points compared to baseline methods, and outperforming the current state-of-the-art methods by 0.6 and 1.1 percentage points, respectively.

Key words: 3D object detection, point cloud, manifold learning, Transformer, shape relation

鲜世洋, 李宗民, 公绪超, 徐畅, 张鹏, 王文超, 白云, 戎光彩. 全局形状关系约束的点云三维目标检测方法[J]. 计算机工程与应用, 2025, 61(18): 132-141.

XIAN Shiyang, LI Zongmin, GONG Xuchao, XU Chang, ZHANG Peng, WANG Wenchao, BAI Yun, RONG Guangcai. Point Cloud 3D Object Detection Method with Global Shape Relation Constraints[J]. Computer Engineering and Applications, 2025, 61(18): 132-141.

参考文献

[1] QI C R, LIU W, WU C X, et al. Frustum PointNets for 3D object detection from RGB-D data[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 918-927.
[2] MARCHAND E, UCHIYAMA H, SPINDLER F. Pose estimation for augmented reality: a hands-on survey[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(12): 2633-2651.
[3] SONG S R, XIAO J X. Deep sliding shapes for amodal 3D object detection in RGB-D images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 808-816.
[4] YANG B, LUO W, URTASUN R. Pixor: real-time 3D object detection from point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7652-7660.
[5] MISRA I, GIRDHAR R, JOULIN A. An end-to-end transformer model for 3D object detection[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 2886-2897.
[6] QI C R, LITANY O, HE K M, et al. Deep Hough voting for 3D object detection in point clouds[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9276-9285.
[7] CHARLES R Q, HAO S, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 77-85.
[8] QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems, 2017.
[9] XIE Q, LAI Y K, WU J, et al. VENet: voting enhancement network for 3D object detection[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3692-3701.
[10] YANG D H, GAO W, LI G, et al. Exploiting manifold feature representation for efficient classification of 3D point clouds[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2023, 19(1s): 1-21.
[11] MEIL? M, ZHANG H Y. Manifold learning: what, how, and why[J]. Annual Review of Statistics and Its Application, 2024, 11: 393-417.
[12] LIU Y C, FAN B, XIANG S M, et al. Relation-shape convolutional neural network for point cloud analysis[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 8887-8896.
[13] 张冬冬, 郭杰, 陈阳. 基于原始点云的三维目标检测算法[J]. 计算机工程与应用, 2023, 59(3): 209-217.
ZHANG D D, GUO J, CHEN Y. 3D object detection algorithm based on raw point clouds[J]. Computer Engineering and Applications, 2023, 59(3): 209-217.
[14] DENG J J, SHI S S, LI P W, et al. Voxel R-CNN: towards high performance voxel-based 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 1201-1209.
[15] ZHOU Y, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4490-4499.
[16] GWAK J, CHOY C, SAVARESE S. Generative sparse detection networks for 3D single-shot object detection[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 297-313.
[17] ZHANG Z W, SUN B, YANG H T, et al. H3DNet: 3D object detection using hybrid geometric primitives[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 311-329.
[18] XIE Q, LAI Y K, WU J, et al. MLCVNet: multi-level context VoteNet for 3D object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10444-10453.
[19] SHI S S, WANG X G, LI H S. PointRCNN: 3D object proposal generation and detection from point cloud[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 770-779.
[20] CHEN J T, LEI B W, SONG Q Y, et al. A hierarchical graph network for 3D object detection on point clouds[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 389-398.
[21] YANG Y Q, FENG C, SHEN Y R, et al. FoldingNet: point cloud auto?encoder via deep grid deformation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 206-215.
[22] YANG Z T, SUN Y N, LIU S, et al. 3DSSD: point-based 3D single stage object detector[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11037-11045.
[23] ZHU Y, HUI L, SHEN Y Q, et al. SPGroup3D: superpoint grouping network for indoor 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 7811-7819.
[24] CHEN Y J, XU F, CHEN G D, et al. Point cloud 3D object detection method based on density information?local feature fusion[J]. Multimedia Tools and Applications, 2024, 83(1): 2407-2425.
[25] SHU J, YU S Q, SHU X Y, et al. SOA: seed point offset attention for indoor 3D object detection in point clouds[J]. Computers & Graphics, 2024, 123: 103992.
[26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017.
[27] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv:1810.04805, 2018.
[28] BELLO I, ZOPH B, LE Q, et al. Attention augmented convolutional networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3285-3294.
[29] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]///Proceedings of the European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[30] WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 1-12.
[31] YAN X, ZHENG C D, LI Z, et al. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5588-5597.
[32] FENG M T, GILANI S Z, WANG Y N, et al. Relation graph network for 3D object detection in point clouds[J]. IEEE Transactions on Image Processing, 2020, 30: 92-107.
[33] WANG Y, SOLOMON J M. Object DGCNN: 3D object detection using dynamic graphs[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. New York: ACM, 2021: 20745-20758.
[34] DONG S W, KONG X Y, PAN X J, et al. Semantic-context graph network for point-based 3D object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(11): 6474-6486.
[35] ZHAO L C, GUO J Y, XU D, et al. Transformer3D-det: improving 3D object detection by vote refinement[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(12): 4735-4746.
[36] SAUL L K, ROWEIS S T. An introduction to locally linear embedding[J]. Journal of Machine Learning Research, 2000.
[37] DAI A, CHANG A X, SAVVA M, et al. ScanNet: richly-annotated 3D reconstructions of indoor scenes[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2432-2443.
[38] SONG S R, LICHTENBERG S P, XIAO J X. SUN RGB-D: a RGB-D scene understanding benchmark suite[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 567-576.
[39] REN Z L, SUDDERTH E B. Three-dimensional object detection and layout prediction using clouds of oriented gradients[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1525-1533.
[40] LAHOUD J, GHANEM B. 2D-driven 3D object detection in RGB-D images[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4632-4640.
[41] DUAN Y, ZHU C Y, LAN Y Q, et al. DisARM: displacement aware relation module for 3D detection[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 16959-16968.
[42] LI Z C, YU H S, YANG Z G, et al. AShapeFormer: semantics-guided object-level active shape encoding for 3D object detection via transformers[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1012-1021.
[43] YI L, ZHAO W, WANG H, et al. GSPN: generative shape proposal network for 3D instance segmentation in point cloud[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3942-3951.
[44] HOU J, DAI A, NIE?NER M. 3D-SIS: 3D semantic instance segmentation of RGB-D scans[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4416-4425.
[45] QI C R, CHEN X L, LITANY O, et al. ImVoteNet: boosting 3D object detection in point clouds with image votes[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 4403-4412.