Deep Reinforcement Learning for Manipulator Multi-Object Grasping in Dense Scenes

doi:10.3778/j.issn.1002-8331.2307-0326

Abstract

Abstract: Robots are prone to collisions while grasping objects in cluttered scenes, relying on pushing to create space for grasping. Existing push-grasping collaborative methods demonstrate low sample efficiency and grasping success rates. To address these problems, a new deep reinforcement learning method based on DDQN (double deep Q network) is proposed to efficiently learn excellent push-grasp cooperative strategies. The system incorporates a mask function that screens effective actions, allowing the robot to focus on samples that facilitate efficient learning. Additionally, the push reward function is designed using the difference between the average relative distances of all objects in the workspace before and after pushing, which allows for a more precise assessment of the impact of candidate pushing on density. The experimental results of the method with VPG (visual pushing grasping) are analyzed to show that the proposed method accelerates the training process while improving the grasping success rate, and verify that the system can be fully adapted to real world.

Key words: deep reinforcement learning, manipulator, synergies between pushing and grasping, dense scenes

摘要： 机械臂在密集杂乱场景中抓取物体时容易碰撞，需要借助推动分离物体，以获取足够的抓取空间。目前已有的推抓协同方法样本效率和抓取成功率偏低，针对这些问题，提出了一种新的基于DDQN（double deep Q network）的深度强化学习方法能够高效地学习优秀的推抓协同策略，该方法包括可以筛选有效动作的掩码函数（mask function），协助机械臂探索更多有助于抓取的样本，促进模型进行高效率的学习。同时，利用推动前后工作空间内所有物体的平均相对距离的差值设计了推动奖励函数，能够更准确地评估候选推动作对物体密集程度的影响。通过该方法与VPG（visual pushing grasping）算法的实验结果进行分析，证明提出的方法在加快训练进程的同时，也提高了抓取的成功率，验证系统可以完整地迁移到现实场景中。

关键词: 深度强化学习, 机械臂, 推抓协同, 密集场景

LI Xin, SHEN Jie, CAO Kai, LI Tao. Deep Reinforcement Learning for Manipulator Multi-Object Grasping in Dense Scenes[J]. Computer Engineering and Applications, 2024, 60(23): 325-332.

李鑫, 沈捷, 曹恺, 李涛. 深度强化学习的机械臂密集场景多物体抓取方法[J]. 计算机工程与应用, 2024, 60(23): 325-332.

References

[1] MAHLER J, LIANG J, NIYAZ S, et al. Dex-Net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics[J]. arXiv:1703.09312, 2017.
[2] 喻群超, 尚伟伟, 张驰. 基于三级卷积神经网络的物体抓取检测[J]. 机器人, 2018, 40(5): 762-768.
YU C Q, SHANG W W, ZHANG C. Object grasp detecting based on three-level convolution neural network[J]. Robot, 2018, 40(5): 762-768.
[3] 夏晶, 钱堃, 马旭东, 等. 基于级联卷积神经网络的机器人平面抓取位姿快速检测[J]. 机器人, 2018, 40(6): 794-802.
XIA J, QIAN K, MA X D, et al. Fast planar grasp pose detection for robot based on cascaded deep convolutional neural networks[J]. Robot, 2018, 40(6): 794-802.
[4] MORALES E F，ZARAGOZA J H. An introduction to reinforcement learning[J]. IEEE, 2011, 11(4): 219-354.
[5] 王鹭. 基于深度强化学习的机械臂密集堆叠物体智能抓取研究[D]. 洛阳: 河南科技大学, 2022.
WANG L. Research on intelligent grasping of densely stacked objects by robotic arm based on deep reinforcement learning [D]. Luoyang: Henan University of Science and Technology, 2022.
[6] DOGAR M R, SRINIVASA S S. A planning framework for non-prehensile manipulation under clutter and uncertainty[J]. Autonomous Robots, 2012, 33: 217-236.
[7] ZENG A, SONG S, WELKER S, et al. Learning synergies between pushing and grasping with self-supervised deep reinforcement learning[C]//Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018: 4238-4245.
[8] YANG Z, SHANG H. Robotic pushing and grasping knowledge learning via attention deep Q-learning network[C]//Proceedings of the 13th International Conference on Knowledge Science, Engineering and Management, 2020: 223-234.
[9] EITEL A, HAUFF N, BURGARD W. Learning to singulate objects using a push proposal network[J]. arXiv:1707.08101, 2017.
[10] JAAKKOLA T, SINGH S, JORDAN M. Reinforcement learning algorithm for partially observable Markov decision problems[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems, 1994: 345-352.
[11] 羊波, 王琨, 马祥祥, 等. 多智能体强化学习的机械臂运动控制决策研究[J]. 计算机工程与应用, 2023, 59(6): 318-325.
YANG B, WANG K, FAN B, et al. Research on motion control method of manipulator based on reinforcement learning[J]. Computer Engineering and Applications, 2023, 59(6): 318-325.
[12] 宁强, 刘元盛, 谢龙洋. 基于SAC的自动驾驶车辆控制方法应用[J]. 计算机工程与应用, 2023, 59(8): 306-314.
N Q, LIU Y S, XIE L Y. Application of SAC-based autonomous vehicle control method[J]. Computer Engineering and Applications, 2023, 59(8): 306-314.
[13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[J]. arXiv:1312.5602, 2013.
[14] HASSELT V H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016: 2094-2100.
[15] ROHMER E, SINGH S P N, FREESE M. V-rep: a versatile and scalable robot simulation framework[C]//Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013: 1321-1326.
[16] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431-3440.
[17] BADRINARAYANAN V, KENDALL A, CIPOLLA R. A deep convolutional encoder-decoder architecture for image segmentation[J]. arXiv:1511.00561, 2015.
[18] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.