DCFNet：Dual-Channel Feature Fusion of Real Scene for Point Cloud Semantic Segmentation

doi:10.3778/j.issn.1002-8331.2305-0290

Abstract

Abstract: The point cloud of the real scene not only has the spatial geometric information of the point cloud, but also has the color information of the 3D object. The existing network cannot effectively use the local features and spatial geometric feature information of the real scene. Therefore, a dual-channel feature fusion of real scene for point cloud semantic segmentation DCFNet can be used for indoor and outdoor scene semantic segmentation in different scenarios. More specifically, in order to solve the problem that the color information of the point cloud in the real scene cannot be fully extracted, the method uses two input channels, and the channel adopts the same feature extraction network structure. The input of the upper channel is the complete RGB color and point cloud coordinate information, and the channel mainly focuses on the scene features of complex objects, while the lower channel only inputs the point cloud coordinate information. This channel mainly focuses on the spatial geometric characteristics of the point cloud. In each channel, in order to better extract local and global information and improve network performance, the inter-layer fusion module and the Transformer channel feature expansion module are introduced. At the same time, the existing 3D point cloud semantic segmentation methods lack of attention to the relationship between local features and global features, which leads to poor segmentation results for complex scenes. In this paper, the features extracted from the upper and lower channels are fused by the DCFFS (dual-channel feature fusion segmentation) module, and the semantic segmentation of the real scene is performed. The experimental results show that the mean intersection over union (MIOU) of the proposed DCFNet segmentation method on the S3DIS Area5 indoor scene dataset and the STPLS3D outdoor scene dataset reaches 71.18% and 48.87% respectively. The mean average precision (MACC) and overall accuracy (OACC) reach 77.01% and 86.91% respectively, achieving high-precision point cloud semantic segmentation in real scenes.

Key words: deep learning, dual-channel feature fusion, point cloud semantic segmentation, attention mechanism

摘要： 真实场景点云不仅具有点云的空间几何信息，还具有三维物体的颜色信息，现有的网络无法有效利用真实场景的局部特征以及空间几何特征信息，因此提出了一种双通道特征融合的真实场景点云语义分割方法DCFNet（dual-channel feature fusion of real scene for point cloud semantic segmentation）可用于不同场景下的室内外场景语义分割。更具体地说，为了解决不能充分提取真实场景点云颜色信息的问题，该方法采用上下两个输入通道，通道均采用相同的特征提取网络结构，其中上通道的输入是完整RGB颜色和点云坐标信息，该通道主要关注于复杂物体对象场景特征，下通道仅输入点云坐标信息，该通道主要关注于点云的空间几何特征；在每个通道中为了更好地提取局部与全局信息，改善网络性能，引入了层间融合模块和Transformer通道特征扩充模块；同时，针对现有的三维点云语义分割方法缺乏关注局部特征与全局特征的联系，导致对复杂场景的分割效果不佳的问题，对上下两个通道所提取的特征通过DCFFS（dual-channel feature fusion segmentation）模块进行融合，并对真实场景进行语义分割。对室内复杂场景和大规模室内外场景点云分割基准进行了实验，实验结果表明，提出的DCFNet分割方法在S3DIS Area5室内场景数据集以及STPLS3D室外场景数据集上，平均交并比（MIOU）分别达到71.18%和48.87%，平均准确率（MACC）和整体准确率（OACC）分别达到77.01%与86.91%，实现了真实场景的高精度点云语义分割。

关键词: 深度学习, 双通道特征融合, 点云语义分割, 注意力机制

SUN Liujie, ZHU Yaoda, WANG Wenju. DCFNet：Dual-Channel Feature Fusion of Real Scene for Point Cloud Semantic Segmentation[J]. Computer Engineering and Applications, 2024, 60(12): 160-169.

孙刘杰, 朱耀达, 王文举. 双通道特征融合的真实场景点云语义分割方法[J]. 计算机工程与应用, 2024, 60(12): 160-169.

References

[1] GUO M H, CAI J X, LIU Z N, et al. PCT: point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187-199.
[2] ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]//18th IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 16239-16248.
[3] ZHANG R, ZENG Z, GUO Z, et al. DSPoint: dual-scale point cloud recognition with high-frequency fusion[J]. arXiv:2111.10332, 2021.
[4] RAN H, LIU J, WANG C. Surface representation for point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 18942-18952.
[5] HU Q Y, YANG B, XIE L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds[C]//Proceedings 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 11105-11114.
[6] LIU K, GAO Z, LIN F, et al. FG-Net: a fast and accurate framework for large-scale lidar point cloud understanding[J] IEEE Transactions on Cybernetics, 2022, 53(1): 553-564.
[7] RAN H X, ZHUO W, LIU J, et al. Learning inner-group relations on point clouds[C]//18th IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 15457-15467.
[8] THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds[C]//IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 6420-6429.
[9] GUO Y, WANG H, HU Q, et al. Deep learning for 3d point clouds: a survey[J] IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(12): 4338-4364.
[10] LI Y, BU R, SUN M, et al. PointCNN: convolution on Χ-transformed points[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018: 828-838.
[11] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]//IEEE International Conference on Computer Vision, 2015: 945-953.
[12] ABOU ZEID K, SCHULT J, HERMANS A, et al. Point2Vec for self-supervised representation learning on point clouds[J]. arXiv:2303.16570, 2023.
[13] LE T, DUAN Y. PointGrid: a deep network for 3D shape understanding[C]//31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018: 9204-9214.
[14] TANG H, LIU Z, ZHAO S, et al. Searching efficient 3D architectures with sparse point-voxel convolution[C]//European Conference on Computer Vision, 2020: 685-702.
[15] HOU Y N, ZHU X G, MA Y X, et al. Point-to-voxel knowledge distillation for LiDAR semantic segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 8469-8478.
[16] QI C R, SU H, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 77-85.
[17] WANG W, YU R, HUANG Q, et al. SGPN: similarity group proposal network for 3D point cloud instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 2569-2578.
[18] QI C R, YI L, SU H, et al. PointNet++ deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5105-5114.
[19] MA Y, GUO Y, LEI Y, et al. 3DMAX-Net: a multi-scale spatial contextual network for 3D point cloud semantic segmentation[C]//2018 24th International Conference on Pattern Recognition (ICPR), 2018: 1560-1566.
[20] WU W, QI Z, LI F X. PointConv: deep convolutional networks on 3D point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 9613-9622.
[21] XU M, DING R, ZHAO H, et al. PAConv: position adaptive convolution with dynamic kernel assembling on point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3173-3182.
[22] XIANG T G, ZHANG C Y, SONG Y, et al. Walk in the cloud: learning curves for point clouds shape analysis[C]//18th IEEE/CVF International Conference on Computer Vision (ICCV), 2021: 895-904.
[23] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv:2010.11929, 2020.
[24] LAI X, LIU J H, JIANG L, et al. Stratified Transformer for 3D point cloud segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 8490-8499.
[25] PARK J, LEE S, KIM S, et al. Self-positioning point-based Transformer for point cloud understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 21814-21823.
[26] ZHOU J, XIONG Y, CHIU C, et al. SAT: size-aware Transformer for 3D point cloud semantic segmentation[J]. arXiv:2301.06869, 2023.
[27] HUANG Z, ZHAO Z, LI B, et al. LCPFormer: towards effective 3D point cloud analysis via local context propagation in Transformers[J]. arXiv:2210.12755, 2022.
[28] PARK C, JEONG Y, CHO M S, et al. Fast point Transformer[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 16928-16937.
[29] QIAN G, LI Y, PENG H, et al. PointNeXt: revisiting PointNet++ with improved training and scaling strategies[J]. arXiv:2206.04670, 2022.
[30] FAN S, DONG Q, ZHU F, et al. SCF-Net: learning spatial contextual features for large-scale point cloud segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14504-14513.