计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (10): 301-310.DOI: 10.3778/j.issn.1002-8331.2304-0194

• 大数据与云计算 • 上一篇    下一篇

混合动作空间下的多设备边缘计算卸载方法

张冀,齐国梁,朵春红,龚雯雯   

  1. 1.华北电力大学(保定) 计算机系,河北 保定 071000
    2.河北省能源电力知识计算重点实验室,河北 保定 071000
  • 出版日期:2024-05-15 发布日期:2024-05-15

Multi-Device Edge Computing Offload Method in Hybrid Action Space

ZHANG Ji, QI Guoliang, DUO Chunhong, GONG Wenwen   

  1. 1.Department of Computer, North China Electric Power University, Baoding, Hebei 071000, China
    2.Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding, Hebei 071000, China
  • Online:2024-05-15 Published:2024-05-15

摘要: 为降低多设备多边缘服务器场景中设备层级的总成本,并解决现有深度强化学习(deep reinforcement learning,DRL)只支持单一动作空间的算法局限性,提出基于混合决策的多智能体深度确定性策略梯度方法(hybrid-based multi-agent deep determination policy gradient,H-MADDPG)。首先考虑物联网设备/服务器计算能力随负载的动态变化、时变的无线传输信道增益、能量收集的未知性、任务量不确定性多种复杂的环境条件,建立MEC系统模型;其次以一段连续时隙内综合时延、能耗的总成本最小作为优化目标建立问题模型;最后将问题以马尔科夫决策过程(Markov decision procession,MDP)的形式交付给H-MADDPG,在价值网络的辅助下训练并行的两个策略网络,为设备输出离散的服务器选择及连续的任务卸载率。实验结果表明,H-MADDPG方法具有良好的收敛性和稳定性,从计算任务是否密集、延迟是否敏感等不同角度进行观察,H-MADDPG系统整体回报优于Local、OffLoad和DDPG,在计算密集型的任务需求下也能保持更大的系统吞吐量。

关键词: 物联网(IoT), 边缘计算卸载, 多智能体深度确定性策略梯度(MADDPG), 混合动作空间

Abstract: In order to reduce the total cost of device-level in multi-device multi-edge server scenarios and solve the algorithm limitation of existing deep reinforcement learning (DRL) that only supports a single action space, a hybrid-based multi-agent deep determination policy gradient (H-MADDPG) is proposed. Firstly, the MEC system model is established by considering various complex environmental conditions, such as the dynamic change of computing power of IoT devices/servers with load, time-varying wireless transmission channel gain, unknown energy harvesting, and the uncertainty of task size. Then, the problem model is established with the minimum total cost of integrated delay and energy consumption in a continuous time slot as the optimization objective. Finally, the problem is delivered to H-MADDPG in the form of Markov decision process (MDP), which trains two parallel policy networks with the assistance of the value network, and outputs discrete server selection and continuous task offload rate. The experimental results show that the H-MADDPG method has good convergence and stability. From different perspectives, such as whether the computing tasks are intensive or delay sensitive, the overall system return of H-MADDPG is better than Local, OffLoad and DDPG. Compared with other methods, it can maintain greater system throughput under the demand of computationally intensive tasks.

Key words: Internet of things (IoT), mobile edge computing, multi-agent deep determination policy gradient (MADDPG), hybrid action space