计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (12): 160-169.DOI: 10.3778/j.issn.1002-8331.2305-0290

• 模式识别与人工智能 • 上一篇    下一篇

双通道特征融合的真实场景点云语义分割方法

孙刘杰,朱耀达,王文举   

  1. 上海理工大学 出版印刷与艺术设计学院,上海 200093
  • 出版日期:2024-06-15 发布日期:2024-06-14

DCFNet:Dual-Channel Feature Fusion of Real Scene for Point Cloud Semantic Segmentation

SUN Liujie, ZHU Yaoda, WANG Wenju   

  1. College of Communication and Art Design, Shanghai University of Science and Technology, Shanghai 200093, China
  • Online:2024-06-15 Published:2024-06-14

摘要: 真实场景点云不仅具有点云的空间几何信息,还具有三维物体的颜色信息,现有的网络无法有效利用真实场景的局部特征以及空间几何特征信息,因此提出了一种双通道特征融合的真实场景点云语义分割方法DCFNet(dual-channel feature fusion of real scene for point cloud semantic segmentation)可用于不同场景下的室内外场景语义分割。更具体地说,为了解决不能充分提取真实场景点云颜色信息的问题,该方法采用上下两个输入通道,通道均采用相同的特征提取网络结构,其中上通道的输入是完整RGB颜色和点云坐标信息,该通道主要关注于复杂物体对象场景特征,下通道仅输入点云坐标信息,该通道主要关注于点云的空间几何特征;在每个通道中为了更好地提取局部与全局信息,改善网络性能,引入了层间融合模块和Transformer通道特征扩充模块;同时,针对现有的三维点云语义分割方法缺乏关注局部特征与全局特征的联系,导致对复杂场景的分割效果不佳的问题,对上下两个通道所提取的特征通过DCFFS(dual-channel feature fusion segmentation)模块进行融合,并对真实场景进行语义分割。对室内复杂场景和大规模室内外场景点云分割基准进行了实验,实验结果表明,提出的DCFNet分割方法在S3DIS Area5室内场景数据集以及STPLS3D室外场景数据集上,平均交并比(MIOU)分别达到71.18%和48.87%,平均准确率(MACC)和整体准确率(OACC)分别达到77.01%与86.91%,实现了真实场景的高精度点云语义分割。

关键词: 深度学习, 双通道特征融合, 点云语义分割, 注意力机制

Abstract: The point cloud of the real scene not only has the spatial geometric information of the point cloud, but also has the color information of the 3D object. The existing network cannot effectively use the local features and spatial geometric feature information of the real scene. Therefore, a dual-channel feature fusion of real scene for point cloud semantic segmentation DCFNet can be used for indoor and outdoor scene semantic segmentation in different scenarios. More specifically, in order to solve the problem that the color information of the point cloud in the real scene cannot be fully extracted, the method uses two input channels, and the channel adopts the same feature extraction network structure. The input of the upper channel is the complete RGB color and point cloud coordinate information, and the channel mainly focuses on the scene features of complex objects, while the lower channel only inputs the point cloud coordinate information. This channel mainly focuses on the spatial geometric characteristics of the point cloud. In each channel, in order to better extract local and global information and improve network performance, the inter-layer fusion module and the Transformer channel feature expansion module are introduced. At the same time, the existing 3D point cloud semantic segmentation methods lack of attention to the relationship between local features and global features, which leads to poor segmentation results for complex scenes. In this paper, the features extracted from the upper and lower channels are fused by the DCFFS (dual-channel feature fusion segmentation) module, and the semantic segmentation of the real scene is performed. The experimental results show that the mean intersection over union (MIOU) of the proposed DCFNet segmentation method on the S3DIS Area5 indoor scene dataset and the STPLS3D outdoor scene dataset reaches 71.18% and 48.87% respectively. The mean average precision (MACC) and overall accuracy (OACC) reach 77.01% and 86.91% respectively, achieving high-precision point cloud semantic segmentation in real scenes.

Key words: deep learning, dual-channel feature fusion, point cloud semantic segmentation, attention mechanism