适用于点云数据的注意力机制研究

doi:10.3778/j.issn.1002-8331.2106-0397

摘要/Abstract

摘要： 注意力机制作为一种即插即用的有效提高网络特征提取性能的手段，在自然语言处理、图像识别领域有着广泛的应用。然而由于点云数据的不规则性与无序性，使得注意力机制无法直接应用于点云领域。提出适用于点云的注意力机制，以PointNet类网络作为点云特征提取的骨干网络，通过对点云数据进行多角度池化，采用共享权重的多层感知器获取自适应注意力权重，并与原特征相乘以实现输入特征优化，从而提升网络性能，实现注意力机制在点云领域的应用。设计的适用于点云的注意力机制在ModelNet40分类任务上，帮助PointNet（vanilla）和PointNet网络的分类准确率分别提升0.89和0.40个百分点；在ShapeNet零件分割任务上，帮助PointNet网络的平均交并比提升1.38个百分点；在KITTI三维检测任务上，帮助基于视锥体法的融合检测Frustum-PointNet网络在行人和骑行者两种小物体的平均精度也取得了可观的提升。实验结果表明所设计的注意力机制在多种点云处理任务的有效性和轻量级特点。

关键词: 深度学习, 三维点云, 注意力机制, 分类, 分割, 检测

Abstract: As a plug and play method to improve the performance of network feature extraction, attention mechanism is widely used in natural language processing and image recognition. However, due to the irregularity and disorder of point cloud data, attention mechanism can not be directly applied to the field of point cloud. This paper proposes an attention mechanism suitable for point cloud. PointNet networks are used as the backbone network of point cloud feature extraction. Through the multi-angle pooling of point cloud data, the shared weight MLP（multi-layer perceptron） is used to obtain adaptive attention weight, and multiplied with the original feature to optimize input feature, so as to improve network performance and realize the application of attention mechanism in the field of point cloud. The attention mechanism designed in this paper can help the OA（overall accuracy） of PointNet（vanilla） and PointNet to improve 0.89 and 0.40?percentage points respectively in ModelNet40 classification task. In ShapeNet partial segmentation task, the mIoU（mean Intersection over Union） of PointNet can increase 1.38?percentage points. In the KITTI 3D detection task, the AP（average precision） of Frustum-PointNet in pedestrian and cyclist detection has been significantly improved. Experimental results show that the designed attention mechanism is effective and lightweight in multiple point cloud processing tasks.

Key words: deep learning, three-dimensional point cloud, attention mechanism, classification, segmentation, detection

孙一珺, 胡辉, 李子钥, 陈阳, 吴少奕. 适用于点云数据的注意力机制研究[J]. 计算机工程与应用, 2022, 58(23): 254-260.

SUN Yijun, HU Hui, LI Ziyue, CHEN Yang, WU Shaoyi. Research on Attention Mechanism for Point Cloud Data[J]. Computer Engineering and Applications, 2022, 58(23): 254-260.

参考文献

[1] 刘训华，孙韶媛，顾立鹏，等.基于改进Frustum PointNet的3D目标检测[J].激光与光电子学进展，2020，57（20）：328-334.
LIU Xunhua，SUN Shaoyuan，GU Lipeng，et al.3D object detection based on improved Frustum PointNet[J].Laser & Optoelectronics Progress，2020，57（20）：328-334.
[2] GUO Y，WANG H，HU Q，et al.Deep learning for 3D point clouds：a survey[J].IEEE Transactions on Pattern Analysis & Machine Intelligence，2020，43（12）：4338-4364.
[3] SU H，MAJI S，KALOGERAKIS E，et al.Multi-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the IEEE International Conference on Computer Vision，Los Alamitos，2015：945-953.
[4] MATURANA D，SCHERER S.VoxNet：a 3D convolutional neural network for real-time object recognition[C]//2015 IEEE International Conference on Intelligent Robots and Systems，Los Alamitos，2015：922-928.
[5] QI C R，SU H，MO K，et al.Pointnet：deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Los Alamitos，2017：652-660.
[6] YANG J，ZHANG Q，NI B，et al.Modeling point clouds with self-attention and gumbel subset sampling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2019：3323-3332.
[7] 白静，司庆龙，秦飞巍.轻量级实时点云分类网络LightPointNet[J].计算机辅助设计与图形学学报，2019，31（4）：612-621.
BAI J，SI Q L，QIN F W.Lightweight real-time point cloud classification network LightPointNet[J].Journal of Computer-Aided Design & Computer Graphics，2019，31（4）：612-621.
[8] ZHAO H，JIANG L，FU C W，et al.Pointweb：enhancing local neighborhood features for point cloud processing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2019：5565-5573.
[9] YU F，KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv：1511.07122，2015.
[10] QI C R，YI L，SU H，et al.Pointnet++：deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems，Los Alamitos，2017：5099-5108.
[11] WANG Y，SUN Y，LIU Z，et al.Dynamic graph CNN for learning on point clouds[J].ACM Transactions on Graphics，2019，38（5）：1-12.
[12] LIU Y，FAN B，XIANG S，et al.Relation-shape convolutional neural network for point cloud analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Los Alamitos，2019：8895-8904.
[13] JOSEPH-RIVLIN M，ZVIRIN A，KIMMEL R.Momen（e）t：flavor the moments in learning to classify shapes[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops，2019.
[14] 孙一珺，胡辉.基于动态图卷积的加权点云分类网络[J].计算机工程与应用，2022，58（20）：240-246.
SUN Y J，HU H.Weighted point cloud classification network based on dynamic graph convolution[J].Computer Engineering and Applications，2022，58（20）：240-246.
[15] SHI W，RAJKUMAR R.Point-GNN：graph neural network for 3D object detection in a point cloud[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2020：1711-1719.
[16] HU Q，YANG B，XIE L，et al.RandLA-Net：efficient semantic segmentation of large-scale point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2020：11108-11117.
[17] VASWANI A，SHAZEER N，PARMAR N，et al.Attention is all you need[C]//Advances in Neural Information Processing Systems，2017：5998-6008.
[18] SUN Q，FU Y.Stacked self-attention networks for visual question answering[C]//Proceedings of the 2019 International Conference on Multimedia Retrieval，2019：207-211.
[19] LUONG M T，PHAM H，MANNING C D.Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing，2015：1412-1421.
[20] BAHDANAU D，CHO K，BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//3rd International Conference on Learning Representations，2015.
[21] WOO S，PARK J，LEE J Y，et al.CBAM：convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision，2018：3-19.
[22] PARK J，WOO S，LEE J Y，et al.BAM：bottleneck attention module[C]//British Machine Vision Conference，2018.
[23] CHAUDHARI S，MITHAL V，POLATKAN G，et al.An attentive survey of attention models[J].arXiv：1904. 02874，2019.
[24] SHILANE P，MIN P，KAZHDAN M，et al.The Princeton shape benchmark[C]//Proceedings of the Shape Modeling Applications，Los Alamitos，2004：167-178.
[25] YI L，KIM V G，CEYLAN D，et al.A scalable active framework for region annotation in 3D shape collections[J].ACM Transactions on Graphics，2016，35（6）：1-12.
[26] QI C R，LIU W，WU C，et al.Frustum pointnets for 3D object detection from RGB-D data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，Los Alamitos，2018：918-927.
[27] GEIGER A，LENZ P，URTASUN R.Are we ready for autonomous driving?the KITTI vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition，2012：3354-3361.