Grasping Detection Based on Key Point Estimation

doi:10.3778/j.issn.1002-8331.2008-0436

Abstract

Abstract: Grasping is an important capability of human-machine coordination for robots in both service and industrial scenes. Obtaining an accurate grasp detection result is the key for the manipulator to complete the grasp task. In order to improve the accuracy and real-time performance of grasp detection, a grasp detection algorithm based on key point estimation which is improved from CenterNet is proposed. Firstly, in the feature extraction layer of the network, the feature fusion method is used to fuse different feature graphs for reducing feature loss. Secondly, the angle prediction branch is added to predict the grasp angle. Finally, improved Focal Loss is used to reduce the reduction of accuracy caused by the imbalance of positive and negative samples. Different from the anchor-based grasp detection algorithm, which enumerates the potential location of the object and then the object is measured by regression method. The key point estimation method directly predicts the grasp key points, then based on the key points, predicts the size, offset and angle of the target. The experimental results show that compared with the anchor-based grasp detection, the proposed method is more efficient, accurate and simple. The model achieves 97.6% accuracy and 42 frame/s detection speed on the Cornell grasp dataset.

Key words: key point estimation, grasping detection, object detection, deep learning, Cornell grasp dataset

摘要： 抓取是机器人在服务与工业领域中进行人机协调的重要能力，得到一个准确的抓取检测结果是机械臂能否完成抓取任务的关键。为了提高抓取检测的准确率以及实时性，提出了一种由CenterNet改进的基于关键点估计的抓取检测算法。在网络的特征提取层使用了特征融合方法融合不同的特征图，减少特征的丢失；增加了角度预测分支用来预测抓取角度；使用了改进的Focal Loss，减少由于正负样本不均衡导致的模型准确度降低。与基于锚框的抓取检测算法穷举目标潜在位置再进行回归的方式不同，基于关键点估计的抓取检测算法直接预测抓取关键点，并从关键点预测抓取框的尺寸、偏移量以及抓取角度。实验结果表明，与基于锚框的抓取检测相比，该方法更加高效、准确、简洁。在康奈尔数据集上，此模型达到了97.6%的准确率以及42 frame/s的检测速度。

关键词: 关键点估计, 抓取检测, 目标检测, 深度学习, 康奈尔抓取数据集

GUAN Liwen, SUN Xinlei, YANG Pei. Grasping Detection Based on Key Point Estimation[J]. Computer Engineering and Applications, 2022, 58(4): 267-274.

关立文, 孙鑫磊, 杨佩. 基于关键点估计的抓取检测算法[J]. 计算机工程与应用, 2022, 58(4): 267-274.

References

[1] JIANG Y，MOSESON S，SAXENA A.Efficient grasping from RGBD images：learning using a new rectangle representation[C]//2011 IEEE International Conference on Robotics and Automation，2011：3304-3311.
[2] LENZ I，LEE H，SAXENA A.Deep learning for detecting robotic grasps[J].The International Journal of Robotics Research，2015，34（4/5）：705-724.
[3] Robot Learning Lab：learning to grasp[EB/OL].（2009）[2019-07-14].http：//pr.cs.cornell.edu/grasping/rect_data/data.php.
[4] REDMON J，ANGELOVA A.Real-time grasp detection using convolutional neural networks[C]//2015 IEEE International Conference on Robotics and Automation，2015：1316-1322.
[5] KRIZHEVSKY A，SUTSKEVER I，HINTON G.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM，2017，60（6）：84-90.
[6] GUO D，SUN F，LIU H，et al.A hybrid deep architecture for robotic grasp detection[C]//2017 IEEE International Conference on Robotics and Automation，2017：1609-1614.
[7] CHU F J，VELA P A.Deep grasp：detection and localization of grasps with deep neural networks[J].arXiv：1802.00520，
2018.
[8] REN S Q，HE K M，GIRSHICK R，et al.Faster R-CNN：towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137-1149.
[9] KAIMING H，GEORGIA G，PIOTR D，et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42（2）：386-397.
[10] REDMON J，DIVVALA S，GIRSHICK R，et al.You only look once：unified，real-time object detection[J].arXiv：1506.
02640，2015.
[11] REDMON J，FARHADI A.YOLO9000：better，faster，stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：6517-6525.
[12] REDMON J，FARHADI A.YOLOv3：an incremental improvement[J].arXiv：1804.02767，2018.
[13] ZHOU X，WANG D，PHILIPP K.Objects as Points[J].arXiv：1904.0750，2019.
[14] LIU W，ANGUELOV D，ERHAN D，et al.SSD：single shot multibox detector[C]//14th European Conference on Computer Vision，2016.
[15] HE K，ZHANG X，REN S，et al.deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition，2016：770-778.
[16] LIN T Y，DOLLAR P，GIRSHICK R，et al.Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition，2017：936-944.
[17] TAN M，PANG R，LE Q V.EfficientDet：scalable and efficient object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2020.
[18] LIN T Y，GOYAL P，GIRSHICK R，et al.Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision，2017：2999-3007.
[19] DAI J F，LI Y，HE K，et al.R-FCN：object detection via region-based fully convolutional networks[J].arXiv：1605.
06409，2016.
[20] FU C Y，LIU W，RANGA A，et al.DSSD：deconvolutional single shot detector[J].arXiv：1701.06659，2017.
[21] ZHANG S F，WEN L Y，BIAN X，et al.Single-shot refinement neural network for object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition，2018：4203-4212.
[22] ZHANG H B，ZHOU X W，LAN X G，et al.A real-time robotic grasping approach with oriented anchor box[J].IEEE Transactions on Systems，Man，and Cybernetics：Systems，2021，51（5）：3014-3025.