Robotic Actions and Strategy Demonstration Learning Method for Constructing Primitive Library Ideas

doi:10.3778/j.issn.1002-8331.2211-0261

Abstract

Abstract: In order to solve the problems of demonstration data optimization, action and task strategy storage and call in the process of robot demonstration learning, a demonstration learning method based on primitive library is proposed. Action learning uses experts to drag the manipulator to perform actions to obtain demonstration data. Gaussian mixture model and Gaussian mixture regression are used to improve the data quality, and the final demonstration data is converted into the weight value of the basis function by the dynamic motion primitive algorithm. Strategy learning creates task steps as action primitives, adds the obtained weight value to the primitives, builds the primitive business card containing task execution strategy, and forms the primitive library to complete storage. When executing tasks, the primitives are sequentially called from the primitive library. YOLOv5 target detection network and AlexNet image classification network are used to detect target information to match actions and generalize new actions with original action characteristics. This method realizes learning actions and strategy storage from the demonstration, and combining appropriate actions to complete tasks according to actual goals. According to the experiment of steel bar binding scene, 5 action primitives are created, 10 basic actions are learned through expert teaching, the robot successfully completes the lashing task at the intersection of horizontal and vertical reinforcement by using the action primitive library.

Key words: demonstration learning, trajectory imitation learning, task strategy learning, dynamic motion primitives, motion primitive library

摘要： 为解决机器人演示学习过程中演示数据优化、动作与任务策略的存储调用问题，提出一种利用基元库思想的演示学习方法。动作学习采用专家拖动机械臂执行动作获取演示数据，利用高斯混合模型与高斯混合回归提升数据质量，由动态运动基元算法转换为基函数的权重值。策略学习将任务步骤创建为动作基元，向基元内添加得到的权重值并构建包含任务执行策略的基元名片，由基元组成基元库完成存储。执行任务时从基元库中有序调用基元，利用YOLOv5目标检测网络和AlexNet图像分类网络检测目标信息，匹配动作并泛化出具有原动作特征的新动作。该方法实现了从演示中学习动作与策略存储，根据实际目标组合合适动作完成任务。钢筋绑扎实验创建5个动作基元，通过专家演示学习10个动作，机器人利用动作基元库成功完成水平面与竖直面钢筋交叉点绑扎任务说明其有效性。

关键词: 演示学习, 轨迹模仿学习, 任务策略学习, 动态运动基元, 运动基元库

LI Tiejun, LIU Jiaqi, LIU Jinyue, JIA Xiaohui. Robotic Actions and Strategy Demonstration Learning Method for Constructing Primitive Library Ideas[J]. Computer Engineering and Applications, 2024, 60(8): 90-98.

李铁军, 刘家奇, 刘今越, 贾晓辉. 基元库构建思想的机器人动作与策略演示学习方法[J]. 计算机工程与应用, 2024, 60(8): 90-98.

References

[1] 殷聪聪, 张秋菊. 机器人演示学习编程技术研究综述[J]. 计算机科学与探索, 2013, 14(8): 1275-1287.
YIN C C, ZHANG Q J. Review of research on robot programming by learning from demonstration[J]. Journal of Frontiers of Computer Science and Technology, 2013, 14(8): 1275-1287.
[2] ZHANG H X, LYU X Y, LENG W C, et al. Recent advances on vision-based robot learning by demonstration[J]. Recent Patents on Mechanical Engineering, 2018, 11(4): 269-284.
[3] OSA T, ESFAHANI A M G, STOLKIN R, et al. Guiding trajectory optimization by demonstrated distributions[J]. IEEE Robotics and Automation Letters, 2017, 2(2): 819-826.
[4] 李萧. 基于GMM/GMR演示学习方法自适应改进策略研究[D]. 沈阳: 东北大学, 2019.
LI X. Research on adaptive improvement strategy of GMM/GMR towards to learning from demonstration (LfD)[D]. Shenyang: Northeastern University, 2019.
[5] YANG D W, LYU Q, LIAO G, et al. Learning from demonstration: dynamical movement primitives based reusable suturing skill modelling method[C]//Proceedings of the 2018 Chinese Automation Congress, 2018.
[6] KORDIA A H, MELO F S. An end-to-end approach for learning and generating complex robot motions from demonstration[C]//Proceedings of the 16th IEEE International Conference on Control, Automation, Robotics and Vision, 2020.
[7] MA Y, XIE Y, ZHU W, et al. An efficient robot precision assembly skill learning framework based on several demonstrations[J].IEEE Transactions on Automation Science and Engineering, 2022: 1-13.
[8] CEM E, DOGANCAN K, BARIS A. Reward learning from very few demonstrations[J]. IEEE Transactions on Robotics, 2021(3): 893-904.
[9] WACHTER M, SCHULZ S, ASFOUR T, et al. Action sequence reproduction based on automatic segmentation and object-action complexes[C]//Proceedings of the 2013 13th IEEE-RAS International Conference on Humanoid Robots, Humanoids, 2013.
[10] MYTHRA V B, VISHNUNANDAN L N V, JYOTHSNA P B, et al. Extending policy from one-shot learning through coaching[C]//Proceedings of the 2019 28th IEEE International Conference on Robot and Human Interactive Communication, 2019: 1-7.
[11] 赵亮. 面向机器人装配任务的力位混合演示学习关键技术研究[D]. 沈阳: 东北大学, 2020.
ZHAO L. Learning hybrid force position skills from demonstration for robotic assembly tasks[D]. Shenyang: Northeastern University, 2020.
[12] 王朝阳. 基于Kinect的类人机械臂演示学习研究[D]. 哈尔滨: 哈尔滨工业大学, 2017.
WANG C Y. Learning from demonstration for humanoid robot arm based on Kinect[D]. Harbin: Harbin Institute of Technology, 2017.
[13] 孙茂斌. 基于动态时间规整的时序数据相似度量方法研究[D]. 重庆: 重庆邮电大学, 2020.
SUN M B. Research on similarity measurement method of time series data based on dynamic time warping[D]. Chongqing: Chongqing University of Posts and Telecommunications, 2020.
[14] 杨洁, 康宁. 动态时间规整DTW算法的研究[J]. 科技与创新, 2016(4): 11-12.
YANG J, KANG N. Research on dynamic time warping DTW algorithm[J]. Science and Technology & Innovation, 2016(4): 11-12.
[15] 夏寒松. 基于动态时间规整的时间序列相似性度量方法研究[D]. 重庆: 重庆邮电大学, 2021.
XIA H S. The method of similarity measurement based on dynamic time waring in time series data[D]. Chongqing: Chongqing University of Posts and Telecommunications, 2021.
[16] CHEN J, LAU H Y K, XU W J, et al. Towards transferring skills to flexible surgical robots with programming by demonstration and reinforcement learning[C]//Proceedings of the 2016 8th International Conference on Advanced Computational Intelligence, 2016.
[17] 乔少杰, 金琨, 韩楠, 等. 一种基于高斯混合模型的轨迹预测算法[J]. 软件学报, 2015, 26(5): 1048-1063.
QIAO S J, JIN K, HAN N, et al. Trajectory prediction algorithm based on Gaussian mixture model[J]. Journal of Software, 2015, 26(5):1048-1063.
[18] ZHANG H X, LYU X Y, LENG W C, et al. Recent advances on vision-based robot learning by demonstration[J]. Recent Patents on Mechanical Engineering, 2018, 11(4): 269-284.
[19] 李世伟. 基于YOLOv5算法的目标检测与车牌识别系统[J]. 电子技术与软件工程, 2022(1): 138-141.
LI S W. Target detection and license plate recognition system based on YOLOv5 algorithm[J]. Electronic Technology & Software Engineering, 2022(1): 138-141.
[20] 段傲, 李莉, 杨旭. 基于AlexNet的图像识别与分类算法[J]. 天津职业技术师范大学学报, 2022, 32(1): 63-66.
DUAN A, LI L, YANG X. Image recognition and classification algorithm based on AlexNet[J]. Journal of Tianjin University of Technology and Education, 2022, 32(1): 63-66.