Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (10): 172-177.DOI: 10.3778/j.issn.1002-8331.2012-0023

• Pattern Recognition and Artificial Intelligence • Previous Articles     Next Articles

Grasp Pose Estimation Based on Multi-Scale Feature Fusion

XIAO Xianpeng, HU Li, ZHANG Jing, LI Shuchun, ZHANG Hua   

  1. 1.School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
    2.School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
  • Online:2022-05-15 Published:2022-05-15



  1. 1.西南科技大学 信息工程学院,四川 绵阳 621010
    2.中国科学技术大学 信息科学技术学院,合肥 230026

Abstract: In order to improve the success rate of robot grasping, a multi-scale feature fusion method for robot grasping pose estimation is proposed. The method takes RGD information as input, uses ResNet-50 backbone network and integrates FPN(feature pyramid networks) to obtain multi-scale features as the input of grasping generation network to generate grasping candidate frame. The grasping direction coordinates are mapped as the classification task of grasping direction, and ROI Align is used to extract the region of interest, evaluate the grasping candidate box, and obtain the optimal grasping pose of the target. In order to verify the effectiveness of the proposed algorithm, the pose estimation experiment based on Cornell data set is carried out, and the accuracy of pose estimation reaches 96.9%. Based on the Inter RealSense D415 depth camera and UR5 manipulator, a real object platform is built. In the real scene, multiple grasping experiments are carried out on the diverse objects randomly placed in the real scene. The results show that the detection success rate of grasping target is 95.8%, and the success rate of robot grasping is 90.2%.

Key words: grasp pose estimation, RGD information, multi-scale features, grasp proposal network, ROI Align

摘要: 抓取目标多样性、位姿随机性严重制约了机器人抓取的任务适应性,为提高机器人抓取成功率,提出一种融合多尺度特征的机器人抓取位姿估计方法。该方法以RGD信息为输入,采用ResNet-50主干网络,融合FPN(feature pyramid networks)获得多尺度特征作为抓取生成网络的输入,以生成抓取候选框;并将抓取方向坐标映射为抓取方向的分类任务,使用ROI Align进行感兴趣区域提取,评估抓取候选框,获取目标的最优抓取位姿。为验证算法有效性,基于康奈尔抓取数据集开展了抓取位姿估计实验,仿真抓取位姿估计准确度达到96.9%。基于Inter RealSense D415深度相机和UR5机械臂搭建了实物平台,在真实场景下对位姿随机摆放的多样性目标物体进行多次抓取实验,结果显示抓取目标检测成功率为95.8%,机器人抓取成功率为90.2%。

关键词: 抓取位姿估计, RGD信息, 多尺度特征, 抓取建议网络, ROI Align