计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (1): 259-268.DOI: 10.3778/j.issn.1002-8331.2203-0461

• 大数据与云计算 • 上一篇    下一篇

A-DDPG:多用户边缘计算系统的卸载研究

曹绍华,姜佳佳,陈舒,詹子俊,张卫山   

  1. 中国石油大学(华东) 计算机科学与技术学院,山东 青岛 266580
  • 出版日期:2023-01-01 发布日期:2023-01-01

A-DDPG:Research on Offloading of Multi-User Edge Computing System

CAO Shaohua, JIANG Jiajia, CHEN Shu, ZHAN Zijun, ZHANG Weishan   

  1. School of Computer Science and Technology, China University of Petroleum(East China), Qingdao, Shandong 266580, China
  • Online:2023-01-01 Published:2023-01-01

摘要: 为了降低多边缘服务器多用户系统中用户的总成本,结合深度确定性策略梯度(deep deterministic policy gradient,DDPG)、长短期记忆网络(LSTM)和注意力机制,提出了一种基于DDPG的深度强化学习卸载算法(A-DDPG)。该算法采用二进制卸载策略,并且将任务的延迟敏感性和服务器负载的有限性以及任务迁移考虑在内,自适应地卸载任务,以最大限度减少由延迟敏感型任务超时造成的总损失。考虑时延和能耗两个指标并设定了不同的权重值,解决因用户类型不同带来的不公平问题,制定了任务卸载问题以最小化所有任务完成时延和能量消耗的总成本,以目标服务器的选择和数据卸载量为学习目标。实验结果表明,A-DDPG算法具有良好的稳定性和收敛性,与DDPG算法和双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)算法相比,A-DDPG算法的用户总成本分别降低了27%和26.66%,平均达到最优任务失败率的时间分别提前了57.14%和40%,其在奖励、总成本和任务失败率方面取得了较好的效果。

关键词: 移动边缘计算, 计算卸载, 深度确定性策略梯度(DDPG), 资源分配

Abstract: In order to reduce the total cost of users in multi-user systems with multiple edge servers, a deep reinforcement learning offloading algorithm(A-DDPG) based on DDPG is proposed by combining deep deterministic policy gradient(DDPG), long short term memory(LSTM) and attention mechanism, which uses binary offloading strategy and takes into account the latency sensitivity of tasks and the limited server load as well as task migration to adapt offload tasks to minimize the total loss caused by latency-sensitive task timeouts. Two metrics, latency and energy consumption, are considered and different weight values are set to address the unfairness caused by different user types, and the task offloading problem is formulated to minimize the total cost of all task completion latency and energy consumption, with the selection of target servers and the amount of data offloaded as learning objectives. The experimental results show that the A-DDPG algorithm has good stability and convergence, and the total user cost of the A-DDPG algorithm is reduced by 27% and 26.66% compared to the DDPG algorithm and the twin delayed deep deterministic policy gradient(TD3) algorithm respectively. It achieves better results in terms of reward, total cost and task failure rate, as the average time to reach the optimal task failure rate is 57.14% and 40% earlier, respectively.

Key words: mobile edge computing, computational offloading,  , deep deterministic policy gradient(DDPG), resource allocation