Computer Engineering and Applications ›› 2023, Vol. 59 ›› Issue (12): 14-27.DOI: 10.3778/j.issn.1002-8331.2209-0186

• Research Hotspots and Reviews • Previous Articles     Next Articles

Survey of Fully Cooperative Multi-Agent Deep Reinforcement Learning

ZHAO Liyang, CHANG Tianqing, CHU Kaixuan, GUO Libin, ZHANG Lei   

  1. Department of Weaponry and Control, Army Academy of Armored Forces, Beijing 100072, China
  • Online:2023-06-15 Published:2023-06-15

完全合作类多智能体深度强化学习综述

赵立阳,常天庆,褚凯轩,郭理彬,张雷   

  1. 陆军装甲兵学院 兵器与控制系,北京 100072

Abstract: As one of the important branches in the field of machine learning and artificial intelligence, fully cooperative multi-agent deep reinforcement learning effectively combines the expression and decision-making ability of deep reinforcement learning with the distributed cooperation ability of multi-agent system in a general way, which provides an end-to-end solution to the model-free sequential decision-making problem in fully cooperative multi-agent system. Firstly, the basic principles of deep reinforcement learning are described, and the development of single agent deep reinforcement learning is summarized from three main directions:value function based, policy gradient based and actor-critic based. Secondly, the main challenges and training framework of multi-agent deep reinforcement learning are analyzed. Then, according to the different ways of realizing the maximum team joint reward, the fully cooperative multi-agent deep reinforcement learning is divided into four categories:independent learning, communication learning, collaborative learning and reward function shaping. Finally, from the perspective of solving practical problems, the future development direction of fully cooperative multi-agent deep reinforcement learning algorithm is prospected.

Key words: deep reinforcement learning, multi agent, full cooperation, artificial intelligence

摘要: 作为机器学习和人工智能领域的重要分支之一,完全合作类多智能体深度强化学习以一种通用的方式将深度强化学习的表达决策能力和多智能体系统的分布协作能力有效结合,为完全合作类多智能体系统中的无模型序贯决策问题提供了一种端对端的解决方案。对深度强化学习的基本原理进行阐述,并从基于值函数、基于策略梯度和基于演员-评论家三个主要方向对单智能体深度强化学习的发展进行了总结。分析了多智能体深度强化学习面临的主要挑战和主要的训练框架。依据实现最大团队联合奖励方式的不同,将完全合作类的多智能体深度强化学习划分为基于独立学习、基于通信学习、基于协作学习和基于奖励函数塑造四大类,并分别进行了总结分析。从解决实际问题的角度出发,对完全合作类多智能体深度强化学习算法的未来发展方向进行了展望。

关键词: 深度强化学习, 多智能体, 完全合作, 人工智能