Routing Optimization Method Based on DDPG and Programmable Data Plane

doi:10.3778/j.issn.1002-8331.2008-0179

Abstract

Abstract: For uneven flow distribution in the data center network, and the routing decision bias caused by inaccurate network status measurement when deploying the reinforcement learning model in software-defined networks（SDN） with fixed function switches, a routing optimization method based on deep deterministic policy gradient（DDPG） model of reinforcement learning and SDN with programmable data plane is proposed. By customizing the packet processing logic on the programmable data plane, the fine-grained and high-precision network state parameters are obtained, and the link weights of multiple alternative paths are determined according to the network state parameters using the DDPG model on the control plane. The routing path with the maximum residual load capacity is selected for the data flow, and the flow table is issued in the way of source routing. The experimental results show that the proposed method can improve the network throughput and link utilization, and reduce the end-to-end transmission delay and southbound communication overhead.

Key words: programmable data-plane, deep reinforcement learning, network measurement, routing optimization

摘要： 针对于数据中心网络不均衡的流量分布，和在使用固定功能交换机的软件定义网络中部署强化学习模型时，不能精确感知网络状态导致的路由决策偏差问题，设计了一种在具有可编程数据平面的软件定义网络中，基于深度确定性策略梯度（DDPG）强化学习模型的路由优化方法。通过在可编程数据平面自定义数据包处理逻辑，获取细粒度、高精度的网络状态参数，然后在控制平面使用DDPG模型根据网络状态参数确定多条可选路径的链路权值，并为数据流选择具有最大综合剩余负载能力的路由路径，最后以源路由的方式下发流表。实验结果表明，该方法可以在较高的带宽需求下提高网络吞吐量和链路利用率，降低端到端传输时延和南向通信开销。

关键词: 可编程数据平面, 深度强化学习, 网络测量, 路由优化

XU Bo, ZHOU Jianguo, WU Jing, LUO Wei. Routing Optimization Method Based on DDPG and Programmable Data Plane[J]. Computer Engineering and Applications, 2022, 58(3): 143-150.

徐博, 周建国, 吴静, 罗威. 可编程数据平面下基于DDPG的路由优化方法[J]. 计算机工程与应用, 2022, 58(3): 143-150.

References

[1] BOSSHART P，DALY D，GIBB G，et al.P4：programming protocol-independent packet processors[J].ACM SIGCOMM Computer Communication Review，2014，44（3）：87-95.
[2] 刘辰屹，徐明伟，耿男，等.基于机器学习的智能路由算法综述[J].计算机研究与发展，2020，57（4）：671-687.
LIU C Y，XU M W，GENG N，et al.A survey on machine learning based routing algorithms[J].Journal of Computer Research and Development，2020，57（4）：671-687.
[3] WANG G Z，LU G H，JIA W C，et al.A Review on the application of machine learning in SDN routing optimization[J].Journal of Computer Research and Development，2020，57（4）：688.
[4] SENDRA S，REGO A，LLORET J，et al.Including artificial intelligence in a routing protocol using software defined networks[C]//2017 IEEE International Conference on Communications Workshops（ICC Workshops），2017：670-674.
[5] XU Z，TANG J，MENG J，et al.Experience-driven networking：a deep reinforcement learning based approach[C]//IEEE INFOCOM 2018-IEEE Conference on Computer Communications，2018：1871-1879.
[6] TU Z，ZHOU H，LI K，et al.A routing optimization method for software-defined SGIN based on deep reinforcement learning[C]//2019 IEEE Globecom Workshops，2019：1-6.
[7] 兰巨龙，张学帅，胡宇翔，等.基于深度强化学习的软件定义网络QoS优化[J].通信学报，2019，40（12）：60-67.
LAN J L，ZHANG X S，HU Y X，et al.Software-defined networking QoS optimization based on deep reinforcement learning[J].Journal on Communications，2019，40（12）：60-67.
[8] ZHANG J，YE M，GUO Z，et al.CFR-RL：traffic engineering with reinforcement learning in SDN[J].arXiv：2004.11986，2020.
[9] YU C，LAN J，GUO Z，et al.DROM：optimizing the routing in software-defined networks with deep reinforcement learning[J].IEEE Access，2018，6：64533-64539.
[10] STAMPA G，ARIAS M，SáNCHEZ-CHARLES D，et al.A deep-reinforcement learning approach for software-defined networking routing optimization[J].arXiv：1709. 07080，2017.
[11] WITANTO J N，LIM H.Software-defined networking application with deep deterministic policy gradient[C]// Proceedings of the 11th International Conference on Computer Modeling and Simulation，2019：176-179.
[12] KIM C，SIVARAMAN A，KATTA N，et al.In-band network telemetry via programmable dataplanes[C]//ACM SIGCOMM，2015.
[13] 刘争争，毕军，周禹，等.基于P4的主动网络遥测机制[J].通信学报，2018，39（S1）：162-169.
LIU Z Z，BI J，ZHOU Y，et al.Paradigm for proactive telemetry based on P4[J].Journal on Communications，2018，39（S1）：162-169.
[14] 李倩，张凯，魏浩然，等.基于P4和机器学习的路由选择方案探讨[J].邮电设计技术，2018（12）：7-11.
LI Q，ZHANG K，WEI H R，et al.Discussion on routing scheme based on P4 and machine learning[J].Designing Techniques of Posts and Telecommunications，2018（12）：7-11.
[15] LUONG N C，HOANG D T，GONG S，et al.Applications of deep reinforcement learning in communications and networking：a survey[J].IEEE Communications Surveys & Tutorials，2019，21（4）：3133-3174.
[16] LILLICRAP T P，HUNT J J，PRITZEL A，et al.Continuous control with deep reinforcement learning[J].arXiv：1509. 02971，2015.
[17] 杨洋，杨家海，秦董洪.数据中心网络多路径路由算法[J].清华大学学报（自然科学版），2016，56（3）：262-268.
YANG Y，YANG J H，QIN D H.Multipath routing algorithm for data center networks[J].Journal of Tsinghua University（Science and Technology），2016，56（3）：262-268.
[18] 周昱昕，包卫东.数据中心网络负载均衡方案综述[J].指挥信息系统与技术，2018，9（6）：6-12.
ZHOU Y X，BAO W D.Review of load balancing scheme for data center network[J].Command Information System and Technology，2018，9（6）：6-12.
[19] FILSFILS C，NAINAR N K，PIGNATARO C，et al.The segment routing architecture[C]//2015 IEEE Global Communications Conference（GLOBECOM），2015：1-6.
[20] Knowledge-Defined Networking，a-deep-rl-approach-for-sdn-routing-optimization[EB/OL].（2018-01-23）[2020-06-22].https：//github.com/knowledgedefinednetworking/a-deep-rl-approach-for-sdn-routing-optimization.
[21] BREDEL M，BOZAKOV Z，BARCZYK A，et al.Flow-based load balancing in multipathed layer-2 networks using OpenFlow and multipath-TCP[C]//Proceedings of the Third Workshop on Hot Topics in Software Defined Networking，2014：213-214.