Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (12): 93-98.DOI: 10.3778/j.issn.1002-8331.2003-0423

Previous Articles     Next Articles

SDN Routing Optimization Algorithm Based on Reinforcement Learning

CHE Xiangbei, KANG Wenqian, OUYANG Yuhong, YANG Kehan, LI Jian   

  1. 1.Shenzhen Power Supply Bureau Co., Ltd., Shenzhen, Guangdong 510800, China
    2.School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Online:2021-06-15 Published:2021-06-10

基于强化学习的SDN路由优化算法

车向北,康文倩,欧阳宇宏,杨柯涵,李剑   

  1. 1.深圳供电局有限公司,广东 深圳 510800
    2.北京邮电大学 计算机学院,北京 100876

Abstract:

Aiming at the network routing optimization in SDN controller, a routing optimization algorithm is designed based on the PPO model in reinforcement learning. The algorithm can adjust the reward function for different optimization goals to dynamically update the routing strategy, and this algorithm does not depend on any specific network state and has very good generalization performance. Because of adopting the strategy method in reinforcement learning, the control of routing strategy is more elaborate than various Q-learning-based algorithms. Based on Omnet++ simulation software, the performance of the algorithm is evaluated through experiments. Compared with the traditional shortest path routing algorithm, the average delay and end-to-end maximum delay of this routing optimization algorithm on the Sprint structure network are reduced by 29.3% and 17.4%, respectively and throughput rate is increased by 31.77%. The experimental result shows that this PPO-based SDN routing control algorithm not only has good convergence, but also has better performance and stability than the shortest path routing algorithm and the Q-learning based QAR routing algorithm.

Key words: software-defined network, reinforcement learning, SDN routing optimization

摘要:

针对SDN控制器中网络路由的优化问题,基于强化学习中的PPO模型设计了一种路由优化算法。该算法可以针对不同的优化目标调整奖励函数来动态更新路由策略,并且不依赖于任何特定的网络状态,具有较强的泛化性能。由于采用了强化学习中策略方法,该算法对路由策略的控制相比各类基于Q-learning的算法更为精细。基于Omnet++仿真软件通过实验评估了该算法的性能,相比传统最短路径路由算法,路由优化算法在Sprint结构网络上的平均延迟和端到端最大延迟分别降低了29.3%和17.4%,吞吐率提高了31.77%,实验结果说明了基于PPO的SDN路由控制算法不仅具有良好的收敛性,而且相比静态最短路径路由算法与基于Q-learning的QAR路由算法具有更好的性能和稳定性。

关键词: 软件定义网络, 强化学习, SDN路由优化