Multilayer 3D Point Cloud Classification Method Based on Group Self-Attention Mechanism

doi:10.3778/j.issn.1002-8331.2305-0381

Abstract

Abstract: In response to the problems of difficulty in training point cloud classification models and low classification accuracy caused by the large volume and high noise of point cloud data in large-scale urban scenes, this paper proposes a multi-level point cloud classification method with a group self attention mechanism. In the data sampling stage, an adaptive random sampling algorithm is designed to effectively solve the problem of model loading difficulties due to the large amount of point cloud data. Then, the sampled input point cloud data is divided into three levels, which can expand the coverage of point cloud feature data. The three levels divide the point cloud data into 16, 9, and 4 groups, which can reduce the computational complexity of the self attention mechanism. Finally, entering a skip connection module to reuse the lost low dimensional feature information, thereby better improving the accuracy of model classification. Experiments are conducted on the SensatUrban dataset, and the results show that the sampling algorithm proposed in this paper improves the mIoU metric by 0.43 percentage points compared to the farthest point sampling algorithm. The model proposed in this paper improves the mIoU metric by 3.12 and 8.17 percentage points, respectively, compared to the PCT model that also uses self attention mechanism and the classic PointNet++ model.

Key words: computer vision, three-dimensional image processing, point cloud segmentation, random sampling, self attention mechanism

摘要： 针对大规模城市场景点云数据体量大、噪声多等导致的点云分类模型难以训练、模型分类准确率低等问题，设计了一种多层级分组自注意力机制的点云分类模型。该模型在数据采样阶段，设计了一种自适应随机采样算法，可以有效解决模型因点云数据量庞大而加载困难的问题；将采样输入的点云数据划分为三个层级，分层级可以扩大点云特征数据的覆盖范围，三个层级分别将点云数据分为16、9、4组，分组可以减少自注意力机制的计算复杂度；进入一个跳跃连接模块，将丢失的低维度特征信息重新利用，从而更好地提高模型分类精度。在SensatUrban数据集上进行实验，结果表明，采样算法相较于最远点采样算法在mIoU指标上提升了0.43个百分点，该模型比同样采用自注意力机制的PCT模型以及经典的PointNet++模型在mIoU指标上分别提升了3.12、8.17个百分点。

关键词: 计算机视觉, 三维图像处理, 点云分割, 随机采样, 自注意力机制

HE Chunxiu, JING Xianwen, HE Yongning. Multilayer 3D Point Cloud Classification Method Based on Group Self-Attention Mechanism[J]. Computer Engineering and Applications, 2023, 59(24): 259-267.

何春秀, 荆现文, 何永宁. 分组自注意力机制的多层级三维点云分类方法[J]. 计算机工程与应用, 2023, 59(24): 259-267.

References

[1] HU Q，YANG B，KHALID S，et al.Towards semantic segmentation of urban-scale 3D point clouds：a dataset，benchmarks and challenges[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2021：4977-4987.
[2] 杨必胜，梁福逊，黄荣刚.三维激光扫描点云数据处理研究进展、挑战与趋势[J].测绘学报，2017，46（10）：1509-1516.
YANG B S，LIANG F X，HUANG R G.Progress，challenges and perspectives of 3D LiDAR point cloud processing[J].Acta Geodaetica et Cartographica Sinica，2017，46（10）：1509-1516.
[3] 李健，姚亮.融合多特征深度学习的地面激光点云语义分割[J].测绘科学，2021，46（3）：133-139.
LI J，YAO L.Ground laser point cloud smantic segmentation based on multi-feature deep learning[J].Scicnce of Surveying and Mapping，2021，46（3）：133-139.
[4] 景庄伟，管海燕，臧玉府，等.基于深度学习的点云语义分割研究综述[J].计算机科学与探索，2021，15（1）：1-26.
JING Z W，GUAN H Y，ZANG Y F，et al.Survey of point cloud semantic segmentation based on deep learning[J].Journal of Frontiers of Computer Science and Technology，2021，15（1）：1-26.
[5] 文沛，程英蕾，余旺盛.基于深度学习的点云分类方法综述[J].激光与光电子学进展，2021，58（16）：1600003.
WEN P，CHENG Y L，YU W S.Point cloud classification methods based on deep learning：a review[J].Laser & Optoelectronics Progress，2021，58（16）：1600003.
[6] MATURANA D，SCHERER S.VoxNet：a 3D convolutional neural network for real-time object recognition[C]//2015 IEEE/RSJ International Conference on Intelligent Robots and Systems（IROS）.Hamburg，Germany：IEEE，2015：922-928.
[7] WU Z，SONG S，KHOSLA A，et al.3D ShapeNets：a deep representation for volumetric shapes[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition（CVPR）.Boston，MA，USA：IEEE，2015：1912-1920.
[8] KLOKOV R，LEMPITSKY V.Escape from cells：deep kd-networks for the recognition of 3D point cloud models[C]//2017 IEEE International Conference on Computer Vision（ICCV）.Venice：IEEE，2017：863-872.
[9] QI C R，SU H，MO K，et al.PointNet：deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：652-660.
[10] QI C R，YI L，SU H，et al.PointNet++：deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems，2017.
[11] YU X，TANG L，RAO Y，et al.Point-BERT：pre-training 3D point cloud transformers with masked point modeling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2022：19313-19322.
[12] DEVLIN J，CHANG M W，LEE K，et al.BERT：pre-training of deep bidirectional transformers for language understanding[J].arXiv：1810.10485，2018.
[13] 王利媛，付丽华.基于注意力机制点卷积网络的机载LiDAR点云分类[J].激光与光电子学进展，2022，59（10）：1028007.
WANG L Y，FU L H.Airborne LiDAR point cloud classification based on attention mechanism point convolutional network[J].Laser & Optoelectronics Progress，2022，59（10）：1028007.
[14] 王江安，何娇，庞大为.基于动态图卷积网络的点云分类和分割网络[J].激光与光电子学进展，2021，58（12）：1215008.
WANG J A，HE J，PANG D W.Point cloud classification and segmentation network based on dynamic graph convolutional network[J].Laser & Optoelectronics Progress，2021，58（12）：1215008.
[15] ASHISH V，NOAM S，NIKI P，et al.Attention is all you need[J].arXiv：1706.03762，2017.
[16] MOENNING C，DODGSON N A.Fast marching farthest point sampling：UCAM-CL-TR-562[R].University of Cambridge，2003.
[17] 邱永红，曾永年，邹滨.KDT树：一种多维空间数据索引结构[J].计算机工程与应用，2009，45（8）：29-31.
QIU Y H，ZENG Y N，ZOU B.KDT tree：multi-dimensional index structure for spatial data[J].Computer Engineering and Applications，2009，45（8）：29-31.
[18] DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An image is worth 16x16 words：transformers for image recognition at scale[J].arXiv：2010.11929，2020.
[19] LIU Z，LIN Y，CAO Y，et al.Swin transformer：hierarchical vision transformer using shifted windows[J].arXiv：2103.14030，2021.
[20] MAAS A L，HANNUN A Y，NG A Y.Rectifier nonlinearities improve neural network acoustic models[C]//ICML Workshop on Deep Learing for Audio，Speech and Language Processing，Atlanta，Georgia，USA，2013.
[21] BA J L，KIROS J R，HINTON G E.Layer normalization[J].arXiv：1607.06450，2016.
[22] OLAF R，PHILIPP F，THOMAS B.U-net：convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention（MICCAI 2015）.Cham：Springer International Publishing，2015：234-241.
[23] 项学泳，李广云，王力，等.利用局部几何特征与空洞邻域的点云语义分割[J].武汉大学学报（信息科学版），2023，48（4）：534-541.
XIANG X Y，LI G Y，WANG L，et al.Semantic segmentation of point clouds using local geometric features and dilated neighborhoods[J].Geomatics and Information Science of Wuhan University，2023，48（4）：534-541.
[24] GUO M H，CAI J X，LIU Z N，et al.PCT：point cloud transformer[J].Computational Visual Media，2021，7（2）：187-199.
[25] HU Q，YANG B，XIE L，et al.RandLa-net：efficient semantic segmentation of large-scale point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020：11108-11117.
[26] THOMAS H，QI C R，DESCHAUD J E，et al.KPConv：flexible and deformable convolution for point clouds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision，2019：6411-6420.
[27] ZHOU B L，ADITYA K，AGATA L，et al.Learning deep features for discriminative localization[J].arXiv：1512.04150，2015.
[28] RAMPRASAATH R S，MICHAEL C，ABHISHEK D，et al.Grad-CAM：visual explanations from deep networks via gradient-based localization[J].International Journal of Computer Vision，2020，128（2）：336-359.