计算机工程与应用 ›› 2021, Vol. 57 ›› Issue (4): 68-76.DOI: 10.3778/j.issn.1002-8331.2002-0068

• 网络、通信与安全 • 上一篇    下一篇

基于深度强化学习的服务功能链多维资源优化

王晓,唐伦,贺小雨,陈前斌   

  1. 1.重庆邮电大学 通信与信息工程学院,重庆 400065
    2.重庆邮电大学 移动通信技术重点实验室,重庆 400065
  • 出版日期:2021-02-15 发布日期:2021-02-06

Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning

WANG Xiao, TANG Lun, HE Xiaoyu, CHEN Qianbin   

  1. 1.School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
    2.Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Online:2021-02-15 Published:2021-02-06

摘要:

在网络功能虚拟化(Network Function Virtualization,NFV)环境下,保证用户服务功能链(Service Function Chain,SFC)服务质量的同时节约资源消耗,降低运营成本,对运营商来说至关重要。联合考虑SFC部署和无线接入网资源分配,提出一种基于深度强化学习的SFC多维资源联合分配算法。构建一种基于环境感知的SFC资源分配机制,建立用户时延要求、无线速率需求以及资源容量等约束下的SFC部署成本最小化模型。考虑到无线环境的动态变化,将此优化问题转化为一个无模型离散时间马尔科夫决策过程(Markov Decision Process,MDP)模型。由于该MDP状态空间的连续性和动作空间的高维性,采用深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)强化学习算法进行求解,得到最小化部署成本的资源分配策略。仿真结果表明,该算法可在满足性能需求及资源容量等约束的同时,有效降低SFC部署成本和端到端传输时延。

关键词: 网络功能虚拟化, 服务功能链部署, 无线资源分配, 强化学习, 深度确定性策略梯度算法

Abstract:

In the Network Function Virtualization(NFV) environment, it is important for operators to save resource consumption and reduce operating costs while ensuring the service quality of the users’ Service Function Chain(SFC). This paper jointly considers SFC deployment and radio access network resource allocation, and proposes an SFC multi-dimensional resource allocation algorithm based on deep reinforcement learning. Firstly, an SFC resource allocation mechanism based on environment awareness is built, and an SFC deployment cost minimization model is established with the constraints of user delay requirements, wireless rate requirements and resource capacity. Secondly, considering the dynamics of the wireless environment, this optimization problem is transformed into a model-free discrete-time Markov Decision Process(MDP) model. Due to the continuity of the MDP’s state space and the high dimensionality of the action space, a Deep Deterministic Policy Gradient(DDPG) reinforcement learning algorithm is leveraged to solve the problem, accordingly a resource allocation strategy that minimizes the deployment cost is obtained. Simulation results show that the algorithm can effectively reduce the SFC deployment cost and end-to-end transmission delay while satisfying the constraints of performance requirements and resource capacity.

Key words: network function virtualization, service function chain deployment, radio resource allocation, reinforcement learning, deep deterministic policy gradient