可学习动态分组卷积神经网络的大规模点云分割

doi:10.3778/j.issn.1002-8331.2301-0230

摘要/Abstract

摘要： 针对现有大规模点云语义分割算法提取特征时冗余干扰信息过多，导致神经网络分割性能较差的问题，提出可学习动态分组卷积神经网络架构，高效准确地实现大规模点云分割。对输入点云以分组的方式进行局部几何特征提取，并通过动态筛选和修剪冗余特征通道来减少无用特征信息对神经网络特征识别的干扰，进一步提高网络模型语义分割精度。构建位置编码模块，将点云位置特征映射到高维频域空间，使神经网络充分挖掘点云频域特征信息，增强特征的丰富性。对提取到的局部几何特征和全局单点位置特征进行融合，并构建可学习动态分组卷积神经网络，完成解码得到最终分割结果。实验结果表明，该算法在大规模点云分割数据集S3DIS和SemanticKITTI上的mIoU分别为69.6%和58.3%。与现有点云语义分割方法相比，所提出的网络模型具有更高的分割准确率和较低的参数量。

关键词: 大规模点云, 语义分割, 可学习动态分组卷积, 位置编码

Abstract: There exists too much redundant interference information when large-scale point cloud semantic segmentation algorithms extract features, which results in the poor segmentation performance of neural networks. To solve this problem, a learnable dynamic grouping convolutional neural network architecture is proposed to efficiently realize large-scale point cloud segmentation. Firstly, the algorithm extracts local geometric features from the input point cloud in a grouped manner and reduces the interference of useless feature information on neural network feature recognition by dynamically filtering and pruning redundant feature channels, while improving the accuracy of semantic segmentation. Secondly, a positional encoding module is built to map the position feature of the point cloud to the high-dimensional frequency domain space, so that the neural network can fully mine the feature information of the point cloud and enhance the richness of features. Finally, the extracted local geometric feature and position feature are fused, while building a learnable dynamic grouping convolutional neural network to get the final segmentation result. Experimental results show that the mIoU of this algorithm on large-scale point cloud segmentation datasets S3DIS and SemanticKITTI is 69.6% and 58.3%, respectively. Compared with existing point cloud semantic segmentation methods, the proposed network model has higher segmentation accuracy and fewer network parameters.

Key words: large-scale point cloud, semantic segmentation, learnable dynamic grouping convolution, positional encoding

康玥, 杨军. 可学习动态分组卷积神经网络的大规模点云分割[J]. 计算机工程与应用, 2024, 60(10): 217-226.

KANG Yue, YANG Jun. Large-Scale Point Cloud Segmentation by Learnable Dynamic Grouping Convolutional Neural Network[J]. Computer Engineering and Applications, 2024, 60(10): 217-226.

参考文献

[1] 郝雯, 张雯静, 梁玮, 等. 面向三维点云的场景识别方法综述[J]. 光学精密工程, 2022, 30(16): 1988-2005.
HAO W, ZHANG W J, LIANG W, et al. Scene recognition for 3D point clouds: a review[J]. Optics and Precision Engineering, 2022, 30(16): 1988-2005.
[2] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 77-85.
[3] QI C R, LI Y, HAO S, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Cambridge: MIT Press, 2017: 5099-5108.
[4] WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 146.
[5] 杨军, 党吉圣. 采用深度级联卷积神经网络的三维点云识别与分割[J]. 光学精密工程, 2020, 28(5): 1187-1199.
YANG J, DANG J S. Recognition and segmentation of three-dimensional point cloud based on deep cascade convolutional neural network[J]. Optics and Precision Engineering, 2020, 28(5): 1187-1199.
[6] WANG L, HUANG Y C, HOU Y L, et al. Graph attention convolution for point cloud semantic segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 10288-10297.
[7] TECHNICOLOR T, RELATED S, TECHNICOLOR T, et al. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[8] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7?13, 2015. Piscataway: IEEE, 2015: 945-953.
[9] FENG Y F, ZHANG Z Z, ZHAO X B, et al. GVCNN: group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 264-272.
[10] 郑阳, 林春雨, 廖康, 等. 场景视点偏移的激光雷达点云分割[J]. 中国图象图形学报, 2021, 26(10): 2514-2523.
ZHENG Y, LIN C Y, LIAO K, et al. LiDAR point cloud segmentation through scene viewpoint offset[J]. Journal of Image and Graphics, 2021, 26(10): 2514-2523.
[11] MATURANA D, SCHERER S. VoxNet: a 3D convolutional neural network for real-time object recognition[C]//Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Sep 28-Oct 2, 2015. Piscataway: IEEE, 2015: 922-928.
[12] RIEGLER G, ULUSOY A O, GEIGER A. OctNet: learning deep 3D representations at high resolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 6620-6629.
[13] ZENG W, GEVERS T. 3DContextNet: K-D tree guided hierarchical learning of point clouds using local and global contextual cues[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 314-330.
[14] MENG H Y, GAO L, LAI Y K, et al. VV-Net: voxel VAE net with group convolutions for point cloud segmentation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8499-8507.
[15] 党吉圣, 杨军. 多特征融合的三维模型识别与分割[J]. 西安电子科技大学学报, 2020, 47(4): 149-157.
DANG J S, YANG J. 3D model recognition and segmentation based on multi-feature fusion[J]. Journal of Xidian University, 2020, 47(4): 149-157.
[16] THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6410-6419.
[17] HU Q Y, YANG B, XIE L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 11105-11114.
[18] YE M, XU S, CAO T, et al. DRINet: a dual-representation iterative learning network for point cloud segmentation[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 7427-7436.
[19] XIE S, GIRSHICK R, DOLLAR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 5987-5995.
[20] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023.
[21] HORNIK K, STINCHCOMBE M B, WHITE H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(5): 359-366.
[22] ARMENI I, SENER O, ZAMIR A, et al. 3D semantic parsing of large-scale indoor spaces[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 1534-1543.
[23] BEHLEY J, GARBADE M, MILIOTO A, et al. Semantic-KITTI: a dataset for semantic scene understanding of lidar sequences[C]//Proceedings of the 2020 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2020: 9297-9307.
[24] ENGELMAN F, KONTOGIANNI T, HERRMANS A, et al. Exploring spatial context for 3D semantic segmentation of point clouds[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, Venice, Oct 22-29, 2017. Piscataway: IEEE, 2017: 716-724.
[25] HUANG Q, WANG W, NEUMANN U. Recurrent slice networks for 3D segmentation of point clouds[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 2626-2635.
[26] LE E T, KOKKINOS I, MITAR N J. Going deeper with lean point networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 9500-9509.
[27] HE T, GONG D, TIAN Z, et al. Learning and memorizing representative prototypes for 3D point cloud semantic and instance segmentation[C]//Proceedings of the 16th European Conference on Computer Vision Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 564-580.
[28] LANDRIEU L, SIMONOVSKY M. Large-scale point cloud semantic segmentation with superpoint graphs[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 4558-4567.
[29] LI Y Y, BU R, SUN M C, et al. PointCNN: convolution on X-transformed points[C]//Advances in Neural Information Processing Systems 31, Montréal, Dec 3-8, 2018. Cambridge: MIT Press, 2018: 828-838.
[30] YAN X, ZHENG C D, LI Z, et al. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 5589-5598.
[31] XU C F, WU B C, WANG Z N, et al. SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 1-19.
[32] 杨军, 李博赞. 基于自注意力特征融合组卷积神经网络的三维点云语义分割[J]. 光学精密工程, 2022, 30(7): 840-853.
YANG J, LI B Z. Semantic segmentation of 3D point cloud based on self-attention feature fusion group convolutional neural network[J]. Optics and Precision Engineering, 2022, 30(7): 840-853.
[33] SU H, JAMPANI V, SUN D, et al. SPLATNet: sparse lattice networks for point cloud processing[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 2530-2539.
[34] WU B, ZHOU X, ZHAO S, et al. SqueezeSegV2: improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud[C]//Proceedings of the 2019 International Conference on Robotics and Automation, Montreal, May 20-24, 2019. Piscataway: IEEE, 2019: 4376-4382.
[35] TATARCHENKO M, PARK J, KOLTUN V, et al. Tangent convolutions for dense prediction in 3D[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 3887-3896.
[36] MILIOTO A, VIZZO I, BEHLEY J et al. RangeNet++: fast and accurate lidar semantic segmentation[C]//Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, Nov 3-8, 2019. Piscataway: IEEE, 2019: 4213-4220.
[37] ZHANG Y, ZHOU Z, DAVID P, et al. PolarNet: an improved grid representation for online lidar point clouds semantic segmentation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 9601-9610.
[38] ALONSO I, RIAZUELO L, MONTESANO L, et al. 3D-MiniNet: learning a 2D representation from point clouds for fast and efficient 3D LIDAR semantic segmentation[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 5432-5439.