计算机工程与应用 ›› 2022, Vol. 58 ›› Issue (4): 267-274.DOI: 10.3778/j.issn.1002-8331.2008-0436

• 工程与应用 • 上一篇    下一篇

基于关键点估计的抓取检测算法

关立文,孙鑫磊,杨佩   

  1. 1.清华大学 机械工程系,北京 100084
    2.电子科技大学 机械与电气工程学院,成都 611731
  • 出版日期:2022-02-15 发布日期:2022-02-15

Grasping Detection Based on Key Point Estimation

GUAN Liwen, SUN Xinlei, YANG Pei   

  1. 1.Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China
    2.School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
  • Online:2022-02-15 Published:2022-02-15

摘要: 抓取是机器人在服务与工业领域中进行人机协调的重要能力,得到一个准确的抓取检测结果是机械臂能否完成抓取任务的关键。为了提高抓取检测的准确率以及实时性,提出了一种由CenterNet改进的基于关键点估计的抓取检测算法。在网络的特征提取层使用了特征融合方法融合不同的特征图,减少特征的丢失;增加了角度预测分支用来预测抓取角度;使用了改进的Focal Loss,减少由于正负样本不均衡导致的模型准确度降低。与基于锚框的抓取检测算法穷举目标潜在位置再进行回归的方式不同,基于关键点估计的抓取检测算法直接预测抓取关键点,并从关键点预测抓取框的尺寸、偏移量以及抓取角度。实验结果表明,与基于锚框的抓取检测相比,该方法更加高效、准确、简洁。在康奈尔数据集上,此模型达到了97.6%的准确率以及42 frame/s的检测速度。

关键词: 关键点估计, 抓取检测, 目标检测, 深度学习, 康奈尔抓取数据集

Abstract: Grasping is an important capability of human-machine coordination for robots in both service and industrial scenes. Obtaining an accurate grasp detection result is the key for the manipulator to complete the grasp task. In order to improve the accuracy and real-time performance of grasp detection, a grasp detection algorithm based on key point estimation which is improved from CenterNet is proposed. Firstly, in the feature extraction layer of the network, the feature fusion method is used to fuse different feature graphs for reducing feature loss. Secondly, the angle prediction branch is added to predict the grasp angle. Finally, improved Focal Loss is used to reduce the reduction of accuracy caused by the imbalance of positive and negative samples. Different from the anchor-based grasp detection algorithm, which enumerates the potential location of the object and then the object is measured by regression method. The key point estimation method directly predicts the grasp key points, then based on the key points, predicts the size, offset and angle of the target. The experimental results show that compared with the anchor-based grasp detection, the proposed method is more efficient, accurate and simple. The model achieves 97.6% accuracy and 42 frame/s detection speed on the Cornell grasp dataset.

Key words: key point estimation, grasping detection, object detection, deep learning, Cornell grasp dataset