Computer Engineering and Applications ›› 2025, Vol. 61 ›› Issue (19): 147-157.DOI: 10.3778/j.issn.1002-8331.2406-0381

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Fusion of Deep Learning and Geometric Analysis for Robotic Six-Degree-of-Freedom Grasp Pose Estimation

LI Jiacheng, LUO Haitao, ZENG Desheng, LIANG Xiao   

  1. 1.School of Automation, Shenyang Aerospace University, Shenyang 110136, China
    2.State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
  • Online:2025-10-01 Published:2025-09-30

融合深度学习和几何分析的机器人六自由度抓取姿态估计

李家成,骆海涛,曾德生,梁宵   

  1. 1.沈阳航空航天大学 自动化学院,沈阳 110136
    2.中国科学院 沈阳自动化研究所 机器人学国家重点实验室,沈阳 110016

Abstract: Achieving high-precision grasp poses is crucial for improving the success rate and efficiency of robotic grasping operations. To address the challenge of generating reliable grasp poses for unknown objects in unstructured environments, this study proposes a vision-guided grasp pose estimation method that integrates deep learning and geometric analysis. Initially, the method identifies highly graspable points using geometric information to optimize resource utilization in point cloud processing. Subsequently, a six-degree-of-freedom grasping model is developed, combining deep learning and geometric analysis to generate diverse grasp poses. The model’s performance is further enhanced by incorporating SparseSE (sparse squeeze-and-excitation), SimAM (simple parameter-free attention module), and EnhancedMLP (enhanced multilayer perceptron). Evaluation on the large-scale benchmark dataset GraspNet-1Billion demonstrates that the proposed method significantly outperforms current advanced methods in terms of average precision (AP) across all three categories of the dataset. Additionally, real-world experiments verify the method’s practical applicability, showcasing its robustness and accuracy in handling various objects in cluttered and overlapping scenarios.

Key words: robotic grasping, pose estimation, deep learning, point cloud

摘要: 高精度的抓取姿态对于提高机器人抓取操作的成功率和效率至关重要。为解决在非结构化环境中难以对未知物体生成可靠抓取姿态的问题,提出一种融合深度学习和几何分析的视觉导向抓取姿态估计方法。设计了一种利用几何信息来识别可抓取性较高点的方法,以解决点云处理过程中资源消耗过大且不能充分利用资源的问题;构建了一个融合深度学习和几何分析的六自由度抓取姿态估计模型,通过处理点云数据,生成丰富的抓取姿态;通过引入SparseSE(sparse squeeze-and-excitation)、SimAM(simple parameter-free attention module)和EnhancedMLP(enhanced multilayer perceptron)等方法,对网络模型进行了进一步的优化;在大型基准数据集GraspNet-1Billion上对该方法进行了评估,结果表明,该方法在GraspNet-1Billion数据集的三个类别中的平均精度(AP)均优于现有的先进方法。此外,通过多次真实场景的抓取验证,进一步证明了该方法的准确性和有效性。

关键词: 机器人抓取, 姿态估计, 深度学习, 点云