计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (22): 196-204.DOI: 10.3778/j.issn.1002-8331.2408-0335

• 模式识别与人工智能 • 上一篇    下一篇

融合点云Transformer的多尺度抓取检测模型

陈鹏,白勇,陈旭,崔家琪   

  1. 河北工业大学 人工智能与数据科学学院,天津 300401
  • 出版日期:2025-11-15 发布日期:2025-11-14

Multiscale Grasping Detection Model Integrating Point Cloud Transformer

CHEN Peng, BAI Yong, CHEN Xu, CUI Jiaqi   

  1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
  • Online:2025-11-15 Published:2025-11-14

摘要: 抓取检测是实现机器人精准抓取的基础,然而现有的抓取检测网络对点云特征提取不充分,采样点的多尺度信息容易被忽略,数据集中也存在着样本分布不平衡的问题。为此,提出一种融合点云Transformer的机器人多尺度抓取检测模型。构建了融合点云Transformer的编解码网络,利用自注意力机制增强了局部特征向量间的信息交互。设计了多圆柱体尺度感知模块以捕获采样点不同感受野下的局部特征,使模型能够进行特征的自适应选择,以捕获采样点的多尺度信息。设计了尺度平衡学习损失函数,通过合理分配权重,以缓解模型训练过程中的尺度偏好现象。所提出的抓取检测模型在GraspNet-1Billion数据集上进行验证,显示其性能超越了现有模型。在真实场景中的抓取实验则证明了模型具备良好的泛化能力,有助于在实际应用场景中稳定、高效地实现精准抓取。

关键词: 抓取检测, 点云Transformer, 多尺度特征, 尺度感知

Abstract: Grasping detection is fundamental to achieving precise robotic grasping. However, existing grasping detection networks usually cannot adequately extract point cloud features, meanwhile multi-scale information of sampled points is usually ignored, furthermore imbalanced sample distribution usually cannot avoid in most datasets. To address these issues, a multiscale grasping detection network model integrating point cloud Transformer is proposed. An encoder-decoder network integrating point cloud Transformer is constructed, which utilizes self-attention mechanism to enhance information interaction between local feature vectors. A multi-cylinder scale perception module is designed to capture the local features of different receptive fields of the sampled points, so that the proposed model can adaptively select effective features to capture the multi-scale information of sampled points. A scale-balanced learning loss function is designed to alleviate the scale bias phenomenon during the training process through reasonable weight assignment. The proposed model is validated on the GraspNet-1Billion dataset, demonstrating that its performance surpasses existing models. Grasping experiments in real-world scenarios further confirm the strong generalization ability of the model, contributing to stable and efficient precise grasping in practical applications.

Key words: grasping detection, point cloud Transformer, multiscale features, scale perception