计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (22): 320-328.DOI: 10.3778/j.issn.1002-8331.2408-0131

• 工程与应用 • 上一篇    下一篇

改进MADDPG算法的未知环境下多智能体单目标协同探索

韩慧妍,石树熙,况立群,韩燮,熊风光   

  1. 1.中北大学 计算机科学与技术学院,太原 030051 
    2.机器视觉与虚拟现实山西省重点实验室,太原 030051
    3.山西省视觉信息处理及智能机器人工程研究中心,太原 030051
  • 出版日期:2025-11-15 发布日期:2025-11-14

Multi-Agent Single-Goal Collaborative Exploration in Unknown Environments with Improving MADDPG Algorithm

HAN Huiyan, SHI Shuxi, KUANG Liqun, HAN Xie, XIONG Fengguang   

  1. 1.School of Computer Science and Technology, North University of China, Taiyuan 030051, China
    2.Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
    3.Shanxi Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
  • Online:2025-11-15 Published:2025-11-14

摘要: 针对多智能体深度确定性策略梯度算法(multi-agent deep deterministic policy gradient,MADDPG)在未知环境下探索效率低下的问题,提出多智能体深度强化学习算法RE-MADDPG-C。利用残差网络(residual network,ResNet)缓解网络中的梯度消失和梯度爆炸问题,提高算法的收敛速度。为解决未知环境下单目标探索中奖励稀疏导致的收敛困难问题,引入多智能体内在好奇心模块(intrinsic curiosity module,ICM),将好奇心奖励作为智能体的内在奖励,为其提供额外的探索动机。通过设计合理的探索奖励函数,使得多智能体能够在未知环境下完成单目标探索任务。仿真实验结果表明,该算法在训练阶段获得的奖励提升更快,能够快速完成探索任务,相比MADDPG及其他算法训练时间缩短,且获得的全局平均奖励更高。

关键词: 深度强化学习, RE-MADDPG-C, 残差网络, 内在好奇心模块(ICM), 奖励稀疏

Abstract: To address the inefficiency of the multi-agent deep deterministic policy gradient (MADDPG) algorithm in unknown environments, a new multi-agent deep reinforcement learning algorithm called RE-MADDPG-C is proposed. This algorithm uses residual networks (ResNet) to alleviate gradient vanishing and explosion issues, enhancing convergence speed. To tackle the convergence difficulty caused by sparse rewards in single-goal exploration in unknown environments, a multi-agent intrinsic curiosity module (ICM) is introduced. The curiosity reward serves as an intrinsic motivation for agents, providing additional exploration incentives. By designing a suitable exploration reward function, agents can accomplish single-goal tasks in unknown environments. Simulation results show that the proposed algorithm achieves faster reward improvement during training, quickly completing exploration tasks. Compared to MADDPG and other algorithms, the proposed algorithm reduces training time and achieves higher global average rewards.

Key words: deep reinforcement learning, RE-MADDPG-C, residual network, intrinsic curiosity module (ICM), sparse rewards