3D Object Detection in Substation Scene Based on Voxelization

doi:10.3778/j.issn.1002-8331.2302-0331

Abstract

Abstract: Aiming at the problem of low detection accuracy caused by insufficient target feature extraction in substation 3D scene, a voxelization-based 3D object detection model AugSecond for substation scene is proposed, which is designed based on the Second network structure. It introduces a triple attention mechanism in the voxel feature encoding stage, which focuses on multi-dimensional attention to enhance the key information of the target and reduce the interference of irrelevant feature information. It designes asymmetric sparse convolutional networks, uses asymmetric convolution to improve convolutional kernel representation capabilities and fuses multi-scale features to enrich target geometry information. Meanwhile, the position regression loss is optimized, and CIoU Loss is used to further consider the geometric correlation between bounding boxes to speed up the network convergence. Experiments on self-built power scene data sets and public data sets show that compared with the benchmark model, AugSecond model significantly improves recognition accuracy and has real-time reasoning speed, which proves the effectiveness of the proposed model.

Key words: voxelization, 3D object detection, triple attention, asymmetric sparse convolution

摘要： 针对变电站三维场景中目标特征提取不充分引起的检测精度低的问题，提出一种基于体素化的变电站场景三维目标检测模型AugSecond。该模型基于Second网络结构设计，在体素特征编码阶段引入三重注意力机制，关注多维注意力以增强目标关键信息，降低无关特征信息干扰；设计非对称稀疏卷积网络，使用非对称卷积提高卷积核表征能力，并融合多尺度特征以丰富目标几何信息；同时对位置回归损失进行优化，使用CIoU Loss进一步考虑包围框之间的几何相关性以加快网络收敛速度。在自建电力场景数据集和公开数据集实验表明，相比基准模型，AugSecond模型显著提升识别精度并具备实时性推理速度，证明了所提模型的有效性。

关键词: 体素化, 三维目标检测, 三重注意力, 非对称稀疏卷积

WANG Dawei, HU Fan, ZHANG Na, YANG Gang, LU Jiyuan, ZHANG Xingzhong. 3D Object Detection in Substation Scene Based on Voxelization[J]. Computer Engineering and Applications, 2024, 60(11): 328-335.

王大伟, 胡帆, 张娜, 杨罡, 鲁霁原, 张兴忠. 基于体素化的变电站场景三维目标检测[J]. 计算机工程与应用, 2024, 60(11): 328-335.

References

[1] 郝腾飞, 郭建龙, 冯伟夏, 等. UWB与虚实场景相似性度量融合的精确定位方法在变电站运维安全中的应用研究[J]. 武汉大学学报 (工学版), 2022, 55(8): 833-839.
HAO T F, GUO J L, FENG W X, et al. Application study of a precise positioning method combining UWB and virtual-real scenes similarity measurement in substation operation and maintenance safety[J]. Journal of Wuhan University (Engineering Edition), 2022, 55(8): 833-839.
[2] 何国立. 基于视频图像的变电站安全违规行为识别算法研究与应用[D]. 杭州: 浙江大学, 2021.
HE G L. Research and application of recognition algorithms for safety violation behaviors in substation based on video images[D]. Hangzhou: Zhejiang University, 2021.
[3] 陈汐. 一种新型视频目标分割算法及其在变电站人员智能监控中的应用研究[D]. 杭州: 浙江大学, 2020.
CHEN X. A novel video object segmentation method and its application on intelligent person surveillance for transformer substation[D]. Hangzhou: Zhejiang University, 2020.
[4] 马一鸣. 智能巡检机器人在无人值守变电站的应用[D]. 保定: 华北电力大学, 2017.
MA Y M. The application of intelligent inspection robot in unattended substation[D]. Baoding: North China Electric Power University, 2017.
[5] 田枫, 姜文文, 刘芳, 等. 混合体素与原始点云的三维目标检测方法[J]. 重庆理工大学学报 (自然科学), 2022, 36(11): 108-117.
TIAN F, JIANG W W, LIU F, et al. 3D object detection method based on mixed voxels and original point clouds[J]. Journal of Chongqing University of Technology (Natural Science), 2022, 36(11): 108-117.
[6] 张婷, 张兴忠, 王慧民, 等. 基于图神经网络的变电站场景三维目标检测[J]. 计算机工程与应用, 2023, 59(9): 329-336.
ZHANG T, ZHANG X Z, WANG H M, et al. 3D object detection in substation scene based on graph neural network[J]. Computer Engineering and Applications, 2023, 59(9): 329-336.
[7] 李瑞龙, 吴川, 朱明. 体素化点云场景下的三维目标检测[J]. 液晶与显示, 2022, 37(10): 1355-1363.
LI R L, WU C, ZHU M. 3D object detection in voxelized point cloud scene[J]. Liquid Crystal and Display, 2022, 37(10): 1355-1363.
[8] 彭育辉, 郑玮鸿, 张剑锋. 基于深度学习的三维目标检测方法综述[J]. 汽车技术, 2020(9): 1-7.
PENG Y H, ZHENG W H, ZHANG J F. Review on the 3D object detection based on deep learning[J]. Automotive Technology, 2020(9): 1-7.
[9] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 652-660.
[10] QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, 2017: 5100-5109.
[11] 高伟, 何搏洋, 张婷, 等. 基于注意力机制的变电站作业场景三维目标检测[J]. 激光与光电子学进展, 2022, 59(22): 165-173.
GAO W, HE B Y, ZHANG T, et al. 3D object detection in substation operation scene based on attention mechanism[J]. Laser and Optoelectronics Progress, 2022, 59(22): 165-173.
[12] ZHOU Y, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[13] YAN Y, MAO Y X, LI B. SECOND: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[14] LIU Z, ZHAO X, HUANG T, et al. TANet: robust 3D object detection from point clouds with triple attention[C]//Proceedings of the National Conference on Artificial Intelligence, 2020: 11677-11684.
[15] DING X, GUO Y, DING G, et al. ACNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1911-1920.
[16] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 12993-13000.
[17] QI C R, LIU W, WU C X, et al. Frustum PointNets for 3D object detection from RGB-D data[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018: 918-927.
[18] LANG A H, SOURABH V, HOLGER C, et al. PointPillars: fast encoders for object detection from point clouds[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019: 12689-12697.
[19] SHI S S, WANG X G, LI H S. PointRCNN: 3D object proposal generation and detection from point cloud[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019: 770-779.
[20] SHI S S, GUO C X, LI J, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020: 10526-10535.