计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (11): 328-335.DOI: 10.3778/j.issn.1002-8331.2302-0331

• 工程与应用 • 上一篇    下一篇

基于体素化的变电站场景三维目标检测

王大伟,胡帆,张娜,杨罡,鲁霁原,张兴忠   

  1. 1.国网山西省电力公司 电力科学研究院,太原 030002
    2.山西鸿顺通科技有限公司,太原 030024
  • 出版日期:2024-06-01 发布日期:2024-05-31

3D Object Detection in Substation Scene Based on Voxelization

WANG Dawei, HU Fan, ZHANG Na, YANG Gang, LU Jiyuan, ZHANG Xingzhong   

  1. 1.Electric Power Research Institute, State Grid Shanxi Electric Power Company, Taiyuan 030002, China
    2.Shanxi Hongshuntong Technology Co., Ltd., Taiyuan 030024, China
  • Online:2024-06-01 Published:2024-05-31

摘要: 针对变电站三维场景中目标特征提取不充分引起的检测精度低的问题,提出一种基于体素化的变电站场景三维目标检测模型AugSecond。该模型基于Second网络结构设计,在体素特征编码阶段引入三重注意力机制,关注多维注意力以增强目标关键信息,降低无关特征信息干扰;设计非对称稀疏卷积网络,使用非对称卷积提高卷积核表征能力,并融合多尺度特征以丰富目标几何信息;同时对位置回归损失进行优化,使用CIoU Loss进一步考虑包围框之间的几何相关性以加快网络收敛速度。在自建电力场景数据集和公开数据集实验表明,相比基准模型,AugSecond模型显著提升识别精度并具备实时性推理速度,证明了所提模型的有效性。

关键词: 体素化, 三维目标检测, 三重注意力, 非对称稀疏卷积

Abstract: Aiming at the problem of low detection accuracy caused by insufficient target feature extraction in substation 3D scene, a voxelization-based 3D object detection model AugSecond for substation scene is proposed, which is designed based on the Second network structure. It introduces a triple attention mechanism in the voxel feature encoding stage, which focuses on multi-dimensional attention to enhance the key information of the target and reduce the interference of irrelevant feature information. It designes asymmetric sparse convolutional networks, uses asymmetric convolution to improve convolutional kernel representation capabilities and fuses multi-scale features to enrich target geometry information. Meanwhile, the position regression loss is optimized, and CIoU Loss is used to further consider the geometric correlation between bounding boxes to speed up the network convergence. Experiments on self-built power scene data sets and public data sets show that compared with the benchmark model, AugSecond model significantly improves recognition accuracy and has real-time reasoning speed, which proves the effectiveness of the proposed model.

Key words: voxelization, 3D object detection, triple attention, asymmetric sparse convolution