Computer Engineering and Applications ›› 2022, Vol. 58 ›› Issue (3): 143-150.DOI: 10.3778/j.issn.1002-8331.2008-0179

• Network, Communication and Security • Previous Articles     Next Articles

Routing Optimization Method Based on DDPG and Programmable Data Plane

XU Bo, ZHOU Jianguo, WU Jing, LUO Wei   

  1. 1.School of Electronic Information, Wuhan University, Wuhan 430072, China
    2.China Ship Development and Design Center, Wuhan 430064, China
  • Online:2022-02-01 Published:2022-01-28

可编程数据平面下基于DDPG的路由优化方法

徐博,周建国,吴静,罗威   

  1. 1.武汉大学 电子信息学院,武汉 430072 
    2.中国舰船研究设计中心,武汉 430064

Abstract: For uneven flow distribution in the data center network, and the routing decision bias caused by inaccurate network status measurement when deploying the reinforcement learning model in software-defined networks(SDN) with fixed function switches, a routing optimization method based on deep deterministic policy gradient(DDPG) model of reinforcement learning and SDN with programmable data plane is proposed. By customizing the packet processing logic on the programmable data plane, the fine-grained and high-precision network state parameters are obtained, and the link weights of multiple alternative paths are determined according to the network state parameters using the DDPG model on the control plane. The routing path with the maximum residual load capacity is selected for the data flow, and the flow table is issued in the way of source routing. The experimental results show that the proposed method can improve the network throughput and link utilization, and reduce the end-to-end transmission delay and southbound communication overhead.

Key words: programmable data-plane, deep reinforcement learning, network measurement, routing optimization

摘要: 针对于数据中心网络不均衡的流量分布,和在使用固定功能交换机的软件定义网络中部署强化学习模型时,不能精确感知网络状态导致的路由决策偏差问题,设计了一种在具有可编程数据平面的软件定义网络中,基于深度确定性策略梯度(DDPG)强化学习模型的路由优化方法。通过在可编程数据平面自定义数据包处理逻辑,获取细粒度、高精度的网络状态参数,然后在控制平面使用DDPG模型根据网络状态参数确定多条可选路径的链路权值,并为数据流选择具有最大综合剩余负载能力的路由路径,最后以源路由的方式下发流表。实验结果表明,该方法可以在较高的带宽需求下提高网络吞吐量和链路利用率,降低端到端传输时延和南向通信开销。

关键词: 可编程数据平面, 深度强化学习, 网络测量, 路由优化