计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (24): 273-280.DOI: 10.3778/j.issn.1002-8331.2503-0110

• 图形图像处理 • 上一篇    下一篇

复杂环境中的多尺度目标抓取检测研究

罗鑫1,臧强1,张永宏1,2+,董天天3,柏宗春4   

  1. 1.南京信息工程大学 自动化学院,南京 210044 
    2.无锡学院 自动化学院,江苏 无锡 214153
    3.南京信息工程大学 电子与信息工程学院,南京 210044
    4.江苏省农业科学院 农业设施与装备研究所,南京 210044
  • 出版日期:2025-12-15 发布日期:2025-12-15

Research on Multi-Scale Target Grasping Detection in Complex Environments

LUO Xin1, ZANG Qiang1, ZHANG Yonghong1,2+, DONG Tiantian3, BAI Zongchun4   

  1. 1.School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China
    2.School of Automation, Wuxi University, Wuxi, Jiangsu 214153, China
    3.School of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
    4.Institute of Agricultural Facilities and Equipment, Jiangsu Academy of Agricultural Sciences, Nanjing 210044, China
  • Online:2025-12-15 Published:2025-12-15

摘要: 在复杂环境中的物体抓取姿态检测是智能机器人实现自主操作的关键技能之一。GSNet在六自由度抓取学习方面取得了良好的效果,但其方法在采样和学习过程中忽略了不同尺度特征对抓取位姿估计的影响,导致对不同大小物体的抓取精度存在一定局限性。提出一个多尺度特征融合模块,该模块采用三种采样策略来获取多尺寸物体特征,并作为新的种子点提取方法,增强抓取检测网络对不同尺寸物体的适应能力。引入多尺度并行卷积结构优化骨干网络,增强网络对物体几何特征的感知能力。在GraspNet-1Billion数据集上进行抓取姿态准确度评估,结果表明改进后的方法取得了较好的结果;同时进一步评估了对于多尺度物体场景中的抓取姿态估计能力,实验结果在估计中、大尺寸物体抓取姿态准确度中表现良好,相较于其他主流方法最高有16.83个百分点的提升,并在真实环境实验中验证了方法的有效性。

关键词: 六自由度抓取, 多尺度特征, 深度学习, 点云

Abstract: Object grasping posture detection in complex environments is a fundamental capability for intelligent robots to achieve autonomous operation. GSNet has demonstrated promising performance in six-degree-of-freedom grasping learning. However, its approach overlooks the impact of multi-scale features on grasp position estimation during sampling and learning, leading to limitations in grasping accuracy for objects of different sizes. To address this issue, a multi-scale feature fusion module is proposed, which uses three sampling strategies to obtain multi-size object features and serves as a new seed point extraction method to improve the adaptability of the grasping detection network to objects of different sizes. Additionally, a multi-scale parallel convolutional structure is introduced to optimize the backbone network, improving its ability to perceive the geometric characteristics of objects. Finally, the grasp pose estimation accuracy is evaluated on the GraspNet-1Billion dataset. The results demonstrate that the proposed method achieves superior performance compared to baseline approaches. Additional experiments in multi-scale object scenarios demonstrated that the method performs well in grasp pose estimation for medium and large-sized objects, achieving improvements of up to 16.83 percengtage pionts over state-of-the-art methods. The effectiveness of the approach is further validated in real-world robotic experiments.

Key words: six-degree-of-freedom grasping, multi-scale features, deep learning, point cloud