一种结合演示数据和演化优化的强化学习方法

计算机工程与应用 ›› 2014, Vol. 50 ›› Issue (11): 115-119.

• 数据库、数据挖掘、机器学习 • 上一篇下一篇

一种结合演示数据和演化优化的强化学习方法

宋拴，俞扬

南京大学计算机软件新技术国家重点实验室，南京 210023

出版日期:2014-06-01 发布日期:2015-04-08

Reinforcement learning method via combining demonstration data and evolutionary optimization

SONG Shuan, YU Yang

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

Online:2014-06-01 Published:2015-04-08

摘要/Abstract

摘要： 强化学习研究智能体如何从与环境的交互中学习最优的策略，以最大化长期奖赏。由于环境反馈的滞后性，强化学习问题面临巨大的决策空间，进行有效的搜索是获得成功学习的关键。以往的研究从多个角度对策略的搜索进行了探索，在搜索算法方面，研究结果表明基于演化优化的直接策略搜索方法能够获得优于传统方法的性能；在引入外部信息方面，通过加入用户提供的演示，可以有效帮助强化学习提高性能。然而，这两种有效方法的结合却鲜有研究。对用户演示与演化优化的结合进行研究，提出iNEAT+Q算法，尝试将演示数据通过预训练神经网络和引导演化优化的适应值函数的方式与演化强化学习方法结合。初步实验表明，iNEAT+Q较不使用演示数据的演化强化学习方法NEAT+Q有明显的性能改善。

关键词: 强化学习, 演化算法, 从演示中学习, 神经网络

Abstract: Reinforcement learning aims at learning an optimal policy that maximizes the long term rewards, from interactions with the environment. Since the environment feedbacks commonly delay after a sequences of actions, reinforcement learning has to tackle the problem of searching in a huge policy space, and thus an effective search is the key to a success approach. Previous studies explore various ways to achieve effective search methods, one effective way is employing the evolutionary algorithm as the search method, and another direction is introducing user demonstration data to guide the search. In this work, it investigates the combination of the two directions, and proposes the iNEAT+Q approach, which trains a neural network using the demonstration data as well as integrating the demonstration data into the fitness function for the evolutionary algorithm. Preliminary empirical study shows that iNEAT+Q is superior to NEAT+Q, which is an classical evolutionary reinforcement learning approach.

Key words: reinforcement learning, evolutionary algorithm, learning from demonstrations, neural network

宋拴，俞扬. 一种结合演示数据和演化优化的强化学习方法[J]. 计算机工程与应用, 2014, 50(11): 115-119.

SONG Shuan, YU Yang. Reinforcement learning method via combining demonstration data and evolutionary optimization[J]. Computer Engineering and Applications, 2014, 50(11): 115-119.

[1]	牟清萍，张莹，张东波，王新杰，杨知桥. 目标丢失判别机制的视觉跟踪算法及应用研究[J]. 计算机工程与应用, 2021, 57(9): 140-147.
[2]	包志强，邢瑜，吕少卿，黄琼丹. 改进YOLO V2的6D目标姿态估计算法[J]. 计算机工程与应用, 2021, 57(9): 148-153.
[3]	王林，柴江云. 深度神经网络在多场景车辆属性识别中的研究[J]. 计算机工程与应用, 2021, 57(9): 162-167.
[4]	赵志焱，杨华，胡志伟，宇海萍. 基于TACNN的玉露香梨叶虫害识别[J]. 计算机工程与应用, 2021, 57(9): 176-181.
[5]	周伦钢，孙怡峰，王坤，吴疆，黄维贵，李炳龙. 目标多种多值属性的端端快速识别网络[J]. 计算机工程与应用, 2021, 57(9): 182-190.
[6]	张成，戴俊峰，熊闻心. 融合LeNet-5改进的扫描文档手写日期识别[J]. 计算机工程与应用, 2021, 57(9): 207-211.
[7]	麻哲旭，杨峰，乔旭. 铁路路基病害智能检测方法[J]. 计算机工程与应用, 2021, 57(9): 272-278.
[8]	许昊，张凯，田英杰，种法广，王子超. 深度神经网络图像描述综述[J]. 计算机工程与应用, 2021, 57(9): 9-22.
[9]	冉蓉，徐兴华，邱少华，崔小鹏，欧阳斌. 基于深度卷积神经网络的裂纹检测方法综述[J]. 计算机工程与应用, 2021, 57(9): 23-35.
[10]	蒋斌，钟瑞，张秋闻，张焕龙. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61.
[11]	张鑫，张席. 优先状态估计的双深度Q网络[J]. 计算机工程与应用, 2021, 57(8): 78-83.
[12]	李震霄，孙伟，刘明明，郑丽丽，陈劭颖. 交通监控场景中的车辆检测与跟踪算法研究[J]. 计算机工程与应用, 2021, 57(8): 103-111.
[13]	张越，黄友锐，刘鹏坤. 引入注意力机制的多分辨率人体姿态估计研究[J]. 计算机工程与应用, 2021, 57(8): 126-132.
[14]	李现国，冯欣欣，李建雄. 多尺度残差网络的单幅图像超分辨率重建[J]. 计算机工程与应用, 2021, 57(7): 215-221.
[15]	翟正利，李鹏辉，冯舒. 图对抗攻击研究综述[J]. 计算机工程与应用, 2021, 57(7): 14-21.

一种结合演示数据和演化优化的强化学习方法

Reinforcement learning method via combining demonstration data and evolutionary optimization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics