End-to-End Robotic Arm Vision Servo Research Combined with Bottleneck Attention Mechanism

doi:10.3778/j.issn.1002-8331.2207-0459

Abstract

Abstract: Aiming at the cumbersome feature extraction steps and poor real-time performance of traditional visual servoing algorithms, an end-to-end direct visual servoing algorithm based on convolutional neural network is proposed. By directly predicting the instantaneous linear velocity and instantaneous angular velocity of the camera installed at the end of the robotic arm, servo positioning works without the need to label handcrafted features, camera intrinsics, or depth information. Firstly, the image observed by the camera is input into the GhostNet feature extraction network for information extraction, and the bottleneck attention module (BAM) is integrated into the network to enhance the spatial information and channel information of the target object. Then, the fully connected layer is used as the speed regression function, the linear velocity and angular velocity are decoupled and regressed. Finally, a real-time capture method is used to create the dataset required for training in the real environment, and the velocity labels are generated according to the position-based visual servo control law algorithm. Efficient and accurate localization and tracking tasks are accomplished under the trained initial pose. The experimental test results in a large number of real scenes verify the effectiveness of the algorithm, and it is also robust to scene background information.

Key words: visual servoing, deep learning, attention mechanism, robotic arm control

摘要： 针对传统视觉伺服算法特征提取步骤繁琐、实时性能差的情况，提出了一种基于卷积神经网络的端到端直接视觉伺服算法，通过直接预测安装在机械臂末端相机的瞬时线速度与瞬时角速度来进行伺服定位工作，不需要标记手工特征、相机固有参数或深度信息。将相机观测到的图像输入改进的GhostNet轻量级特征提取网络进行信息提取，在网络中融入瓶颈注意力模块来增强目标物体的空间信息和通道信息；采用全连接层作为速度回归函数对线速度、角速度进行解耦回归；在真实环境下采用一种实时捕捉方法制作训练所需的数据集，速度标签根据基于位置的视觉伺服控制律算法生成，训练好的网络可使机械臂在未经训练的初始位姿下完成高效、精确的定位和跟踪任务。大量真实场景下的实验测试结果验证了该算法的有效性，且对场景背景信息具有一定的鲁棒性。

关键词: 视觉伺服, 深度学习, 注意力机制, 机械臂控制

LIU Bingkun, PI Jiatian, XU Jin. End-to-End Robotic Arm Vision Servo Research Combined with Bottleneck Attention Mechanism[J]. Computer Engineering and Applications, 2024, 60(4): 347-354.

刘炳坤, 皮家甜, 徐进. 结合瓶颈注意力的端到端机械臂视觉伺服研究[J]. 计算机工程与应用, 2024, 60(4): 347-354.

References

[1] SALEHIA I, ROTITHOR G, SALTUS R, et al. Constrained image-based visual servoing using barrier functions[C]//Proceedings of the 2021 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2021: 14254-14260.
[2] 徐德. 单目视觉伺服研究综述[J]. 自动化学报, 2018, 44(10): 1729-1746.
XU D. A tutorial for monocular visual servoing[J]. Acta Automatica Sinica, 2018, 44(10): 1729-1746.
[3] LOWE D G. Distinctive image fratures from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[4] TOLA E, LEPETIT V, FUA P. Daisy: an efficient dense descriptor applied to wide-baseline stereo[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5): 815-830.
[5] BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3): 346-359.
[6] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2011: 6-13.
[7] COLLEWET C, MARCHAND E. Photometric visual servoing[J]. IEEE Transactions on Robotics, 2011, 27(4): 828-834.
[8] MARCHAND E. Subspace-based direct visual servoing[J]. IEEE Robotics and Automation Letters, 2019, 4(3): 2699-2706.
[9] ZHONG X G, ZHONG X Y, PENG X F. Robots visual servo control with features constraint employing Kalman-neural-network filtering scheme[J]. Neurocomputing, 2015, 151: 268-277.
[10] 辛菁, 姚雨蒙, 程晗, 等. 基于卷积神经网络的机器人对未知物体视觉定位控制策略[J]. 信息与控制, 2018, 47(3):355-362.
XIN J, YAO Y M, CHENG H, et al. Vision-based robot positioning control strategy for unknown objects using convolutional neural network[J]. Information and Control, 2018, 47(3): 355-362.
[11] 宋仕杰. 基于深度卷积神经网络和进化策略算法的机器人端对端伺服控制[D]. 武汉: 华中科技大学, 2019.
SONG S J. End-to-end servo control of robot with deep convolution neural network and evolution strategy[D]. Wuhan: Huazhong University of Science and Technology, 2019.
[12] SAXENA A, PANDYA H, KUMAR G, et al. Exploring convolutional networks for end-to-end visual servoing[C]//Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2017: 3817-3823.
[13] FISCHER P, DOSOVITSKIY A, ILG E, ET AL. FlowNet: learning optical flow with convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 2758-2766.
[14] BATEUX Q, MARCHAND E, LEITNER J, et al. Training deep neural networks for visual servoing[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2018: 3307-3314.
[15] GAO J, PROCTOR A, BRADLEY C. Adaptive neural network visual servo control for dynamic positioning of underwater vehicles[J]. Neurocomputing, 2015, 167: 604-613.
[16] NEUBERGER B, PATTEN T, PARK K, et al. Self-initialized visual servoing for accurate end-effector positioning[C]//Proceedings of the IEEE 6th International Conference on Control, Automation and Robotics. Piscataway: IEEE, 2020: 676-682.
[17] PANDYA H, KRISHNA K M, JAWAHAR C V. Discriminative learning based visual servoing across object instances[C]//Proceedings of the 2016 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2016: 3447-3454.
[18] LIU Y, ZHOU L, ZONG H, et al. Regression-based three-dimensional pose estimation for texture-less objects[J]. IEEE Transactions on Multimedia, 2019, 21(11): 2776-2789.
[19] RAUCH C, IVAN V, HOSPEDALES T, et al. Learning-driven coarse-to-fine articulated robot tracking[C]//Proceedings of the 2019 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2019: 6604-6610.
[20] GRIFFIN B A, FLORENCE V, CORSO J J. Video object segmentation-based visual servo control and object depth estimation on a mobile robot[C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2020: 1636-1646.
[21] YU C J, CAI Z G, PHAM H, et al. Siamese convolutional neural network for sub-millimeter-accurate camera pose estimation and visual servoing[C]//Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2019: 935-941.
[22] CHOPRA S, HADSELL R, LECUN Y. Learning a similarity metric discriminatively, with application to face verification[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005: 539-546.
[23] HAN K, WANG Y, TIAN Q, ET AL. GhostNet: more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589.
[24] PARK J, WOO S, LEE J Y, et al. A simple and light-weight attention module for convolutional neural networks[J]. International Journal of Computer Vision, 2020, 128(4): 783-798.
[25] QUIGLEY M, CONLEY K, GERKEY B, et al. ROS: an open source robot operating system[C]//Proceedings of the ICRA Workshop on Open Source Software. Piscataway: IEEE, 2009: 3-32.