Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (23): 325-332.DOI: 10.3778/j.issn.1002-8331.2307-0326

• Engineering and Applications • Previous Articles     Next Articles

Deep Reinforcement Learning for Manipulator Multi-Object Grasping in Dense Scenes

LI Xin, SHEN Jie, CAO Kai, LI Tao   

  1. 1.College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing 211816, China
    2.College of Automation, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Online:2024-12-01 Published:2024-11-29

深度强化学习的机械臂密集场景多物体抓取方法

李鑫,沈捷,曹恺,李涛   

  1. 1.南京工业大学 电气工程与控制科学学院,南京 211816
    2.南京航空航天大学 自动化学院,南京 210016

Abstract: Robots are prone to collisions while grasping objects in cluttered scenes, relying on pushing to create space for grasping. Existing push-grasping collaborative methods demonstrate low sample efficiency and grasping success rates. To address these problems, a new deep reinforcement learning method based on DDQN (double deep Q network) is proposed to efficiently learn excellent push-grasp cooperative strategies. The system incorporates a mask function that screens effective actions, allowing the robot to focus on samples that facilitate efficient learning. Additionally, the push reward function is designed using the difference between the average relative distances of all objects in the workspace before and after pushing, which allows for a more precise assessment of the impact of candidate pushing on density. The experimental results of the method with VPG (visual pushing grasping) are analyzed to show that the proposed method accelerates the training process while improving the grasping success rate, and verify that the system can be fully adapted to real world.

Key words: deep reinforcement learning, manipulator, synergies between pushing and grasping, dense scenes

摘要: 机械臂在密集杂乱场景中抓取物体时容易碰撞,需要借助推动分离物体,以获取足够的抓取空间。目前已有的推抓协同方法样本效率和抓取成功率偏低,针对这些问题,提出了一种新的基于DDQN(double deep Q network)的深度强化学习方法能够高效地学习优秀的推抓协同策略,该方法包括可以筛选有效动作的掩码函数(mask function),协助机械臂探索更多有助于抓取的样本,促进模型进行高效率的学习。同时,利用推动前后工作空间内所有物体的平均相对距离的差值设计了推动奖励函数,能够更准确地评估候选推动作对物体密集程度的影响。通过该方法与VPG(visual pushing grasping)算法的实验结果进行分析,证明提出的方法在加快训练进程的同时,也提高了抓取的成功率,验证系统可以完整地迁移到现实场景中。

关键词: 深度强化学习, 机械臂, 推抓协同, 密集场景