计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (15): 189-198.DOI: 10.3778/j.issn.1002-8331.2404-0319

• 模式识别与人工智能 • 上一篇    下一篇

集双向标准化和稀疏自注意力的点云分割

张蕊,武益超,黄官龙,金玮   

  1. 华北水利水电大学 信息工程学院, 郑州 450046
  • 出版日期:2025-08-01 发布日期:2025-07-31

Research on Point Cloud Segmentation Combining Dual Normalization and Sparse Self-Attention

ZHANG Rui, WU Yichao, HUANG Guanlong, JIN Wei   

  1. School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 随着激光雷达点云在智慧城市规划、自动导航与驾驶、同步定位与高精度制图等领域的广泛应用,点云特征提取作为场景感知与解译的基础而备受关注。然而激光雷达点云存在稀疏、无序、非结构等特性以及点云地物目标之间存在遮挡、远稀近密等问题,使得点云特征提取面临严峻挑战。当前深度学习模型在捕获点云几何信息时,难以有效提取海量非结构化数据的细粒度特征;在捕获语义信息时,稀疏数据环境下对局部空间关系的建模准确度不高。针对以上局限性,构建了一种全新的点云语义分割模型PointDTNet,该模型为解决特征编码阶段分组采样点不规则问题,设计出一种点云双向标准化模块,通过点标准化和逆向点标准化操作,以自适应地方式优化分组点和采样点的密度分布,有效阻止了不规则数据特征的传递。为增强模型对空间关系的建模能力,设计了一种作用于点云局部邻域的Transformer稀疏自注意力模块,该模块为平衡计算效率与模型复杂度,仅允许每个点与其邻域内的点进行交互,并通过位置嵌入的方式对局部邻域内点的相对位置进行编码,以自适应地分配每个点的特征权重。实验结果表明,PointDTNet模型在点云语义分割任务中取得了显著效果,其在S3DIS数据集“Area 5”上的OA、mAcc、mIoU分别达到84.68%、65.11%、54.98%;在相同实验条件下,与基模型PointNet++相比OA、mAcc、mIoU分别提高了1.23、7.72、2.07个百分点。

关键词: 激光雷达点云, 语义分割, 双向标准化, 稀疏自注意力, 多特征融合

Abstract: With the wide application of LiDAR point cloud in the fields of smart city, automatic driving, simultaneous localization and mapping, point cloud feature extraction has attracted much attention as the basis of scene perception and interpretation. However, point cloud has the characteristics of sparse, disordered, and unstructured, coupled with the problems of occlusion and varying density of objects, which makes feature extraction facing serious challenges. The current deep learning model is difficult to effectively extract the fine-grained features of massive non-structured data when capturing the geometric information of the point cloud; and it is also limited to accurately modeling the local spatial relationship under the sparse environment when capturing the semantic information. To this end, this paper proposes a novel point cloud semantic segmentation model, PointDTNet, which solves the irregularity of grouping and sampling points in the feature encoding stage by designing a point cloud dual normalization module, DualNorm, to optimize the density of grouping points and sampling points in an adaptive way through the operations of normalization and inverse normalization, effectively preventing the irregular data features from propagating. Furthermore, in order to enhance the model’s ability to model spatial relationships, a Transformer sparse self-attention module acting on the local neighborhood is designed, which, to balance the efficiency and complexity, only allows each point to interact with the points in its neighborhood, and encodes the relative positions of the points by means of positional embedding, so as to assign the feature weights of each point adaptively. The experimental results show that PointDTNet achieves remarkable results in point cloud semantic segmentation task, and its OA, mAcc, and mIoU on the S3DIS dataset “Area 5” reach 84.68%, 65.11%, and 54.98%, respectively. Under the same experimental conditions, the OA, mAcc, and mIoU are improved by 1.23, 7.72 and 2.07 percentage points, respectively, compared with the baseline model PointNet++.

Key words: LiDAR point cloud, semantic segmentation, dual normalization, sparse self-attention, multi-feature fusion