优化深度确定性策略梯度算法

doi:10.3778/j.issn.1002-8331.1712-0297

计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (7): 151-156.DOI: 10.3778/j.issn.1002-8331.1712-0297

优化深度确定性策略梯度算法

柯丰恺，周唯倜，赵大兴

湖北工业大学机械工程学院，武汉 430068

出版日期:2019-04-01 发布日期:2019-04-15

Optimized Deep Deterministic Policy Gradient Algorithm

KE Fengkai, ZHOU Weiti, ZHAO Daxing

School of Mechanical Engineering, Hubei University of Technology, Wuhan 430068, China

Online:2019-04-01 Published:2019-04-15

摘要/Abstract

摘要： 深度强化学习善于解决控制的优化问题，连续动作的控制因为精度的要求，动作的数量随着动作维度的增加呈指数型增长，难以用离散的动作来表示。基于Actor-Critic框架的深度确定性策略梯度（Deep Deterministic Policy Gradient，DDPG）算法虽然解决了连续动作控制问题，但是仍然存在采样方式缺乏科学理论指导、动作维度较高时的最优动作与非最优动作之间差距被忽视等问题。针对上述问题，提出一种基于DDPG算法的优化采样及精确评价的改进算法，并成功应用于选择顺应性装配机器臂（Selective Compliance Assembly Robot Arm，SCARA）的仿真环境中，与原始的DDPG算法对比，取得了良好的效果，实现了SCARA机器人快速自动定位。

关键词: 强化学习, 深度学习, 连续动作控制, 机器臂

Abstract: Deep reinforcement learning is good at solving the optimization problems of control. Because of the accuracy requirements, with the increasing of action dimension, the number of action increases exponentially. So, it is difficult to express the continuous action with discrete action. The Deep Deterministic Policy Gradient（DDPG） algorithm, based on the Actor-Critic framework, solves the problem of continuous motion control. But there are still some problems, such as the lack of scientific theory of sampling, the neglect of the differences between optimal action and non-optimal action when the action dimension is relatively high. In order to solve these problems, this paper presents an improved algorithm with optimal sampling and precise critic for DDPG algorithm. And it is successfully applied to the simulation of Selective Compliance Assembly Robot Arm（SCARA）. Compared with DDPG algorithm, an improvement effect is achieved and the SCARA robot is quickly and automatically positioned.

Key words: reinforcement learning, deep learning, continuous action control, robot arm

柯丰恺，周唯倜，赵大兴. 优化深度确定性策略梯度算法[J]. 计算机工程与应用, 2019, 55(7): 151-156.

KE Fengkai, ZHOU Weiti, ZHAO Daxing. Optimized Deep Deterministic Policy Gradient Algorithm[J]. Computer Engineering and Applications, 2019, 55(7): 151-156.

[1]	武文杰，宋文爱，高雪梅，杨吉江，王青，黄丽萍，雷毅. 基于X线的成人OSA计算机辅助诊断综述[J]. 计算机工程与应用, 2021, 57(9): 1-8.
[2]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[3]	李晓筱，胡晓光，王梓强，杜卓群. 基于深度学习的实例分割研究进展[J]. 计算机工程与应用, 2021, 57(9): 60-67.
[4]	黄冬宜，杨兵，吴子豪，匡佳一，颜泽明. 用于全市蜂窝流量预测的时空全连接卷积网络[J]. 计算机工程与应用, 2021, 57(9): 168-175.
[5]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[6]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[7]	李明山，韩清鹏，张天宇，王道累. 改进SSD的安全帽检测方法[J]. 计算机工程与应用, 2021, 57(8): 192-197.
[8]	曾春艳，严康，王志锋，余琰，纪纯妹. 深度学习模型可解释性研究综述[J]. 计算机工程与应用, 2021, 57(8): 1-9.
[9]	许德刚，王露，李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
[10]	蒋斌，钟瑞，张秋闻，张焕龙. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61.
[11]	张鑫，张席. 优先状态估计的双深度Q网络[J]. 计算机工程与应用, 2021, 57(8): 78-83.
[12]	赵圆丽，梁志剑. 基于异核卷积双注意机制的立场检测研究[J]. 计算机工程与应用, 2021, 57(8): 119-125.
[13]	李健，孙大松，张备伟. 结合双编码器与对抗训练的图像修复[J]. 计算机工程与应用, 2021, 57(7): 192-197.
[14]	杨波，陶青川，董沛君. 改进Deeplab v3+网络的手术器械分割方法[J]. 计算机工程与应用, 2021, 57(7): 222-227.
[15]	刘迪，贾金露，赵玉卿，钱育蓉. 基于深度学习的图像去噪方法研究综述[J]. 计算机工程与应用, 2021, 57(7): 1-13.

优化深度确定性策略梯度算法

Optimized Deep Deterministic Policy Gradient Algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics