计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (4): 347-354.DOI: 10.3778/j.issn.1002-8331.2207-0459

• 工程与应用 • 上一篇    下一篇

结合瓶颈注意力的端到端机械臂视觉伺服研究

刘炳坤,皮家甜,徐进   

  1. 1. 重庆师范大学  计算机与信息科学学院,重庆  401331
    2. 重庆国家应用数学中心(重庆师范大学),重庆  401331
  • 出版日期:2024-02-15 发布日期:2024-02-15

End-to-End Robotic Arm Vision Servo Research Combined with Bottleneck Attention Mechanism

LIU Bingkun, PI Jiatian, XU Jin   

  1. 1. College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
    2. National Center for Applied Mathematics in Chongqing (Chongqing Normal University), Chongqing 401331, China
  • Online:2024-02-15 Published:2024-02-15

摘要: 针对传统视觉伺服算法特征提取步骤繁琐、实时性能差的情况,提出了一种基于卷积神经网络的端到端直接视觉伺服算法,通过直接预测安装在机械臂末端相机的瞬时线速度与瞬时角速度来进行伺服定位工作,不需要标记手工特征、相机固有参数或深度信息。将相机观测到的图像输入改进的GhostNet轻量级特征提取网络进行信息提取,在网络中融入瓶颈注意力模块来增强目标物体的空间信息和通道信息;采用全连接层作为速度回归函数对线速度、角速度进行解耦回归;在真实环境下采用一种实时捕捉方法制作训练所需的数据集,速度标签根据基于位置的视觉伺服控制律算法生成,训练好的网络可使机械臂在未经训练的初始位姿下完成高效、精确的定位和跟踪任务。大量真实场景下的实验测试结果验证了该算法的有效性,且对场景背景信息具有一定的鲁棒性。

关键词: 视觉伺服, 深度学习, 注意力机制, 机械臂控制

Abstract: Aiming at the cumbersome feature extraction steps and poor real-time performance of traditional visual servoing algorithms, an end-to-end direct visual servoing algorithm based on convolutional neural network is proposed. By directly predicting the instantaneous linear velocity and instantaneous angular velocity of the camera installed at the end of the robotic arm, servo positioning works without the need to label handcrafted features, camera intrinsics, or depth information. Firstly, the image observed by the camera is input into the GhostNet feature extraction network for information extraction, and the bottleneck attention module (BAM) is integrated into the network to enhance the spatial information and channel information of the target object. Then, the fully connected layer is used as the speed regression function, the linear velocity and angular velocity are decoupled and regressed. Finally, a real-time capture method is used to create the dataset required for training in the real environment, and the velocity labels are generated according to the position-based visual servo control law algorithm. Efficient and accurate localization and tracking tasks are accomplished under the trained initial pose. The experimental test results in a large number of real scenes verify the effectiveness of the algorithm, and it is also robust to scene background information.

Key words: visual servoing, deep learning, attention mechanism, robotic arm control