Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (23): 81-88.DOI: 10.3778/j.issn.1002-8331.1708-0274

Previous Articles     Next Articles

Research and design on SDN multi-controller fault tolerance

XIANG Bo, YU Liyang   

  1. School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China
  • Online:2018-12-01 Published:2018-11-30

SDN多控制器容错机制的研究与设计

向  波,俞黎阳   

  1. 华东师范大学 计算机科学与软件工程学院,上海 200062

Abstract: SDN proposes to decouple the control plane from the forwarding plane and provide programmability of the control plane to meet new network requirements. Centralized controllers have brought many conveniences along with a lot of new challenges. Especially, the controller’s single point of failure is a problem which cannot be ignored. This paper studies the fault tolerance and fast recovery of multi-controller mechanism under in-band control scenarios. Traditional multi-controller fault-tolerant scheme mainly focuses on hierarchical architectures, which is, to some extent, a great waste of controller resources. Instead, this paper proposes a planar controller architecture, as controllers connect end to end and form a ring structure, and adjacent controllers are responsible to monitor each other to detect controller failures. Once failure occurs, the switch cluster partition algorithm and switch re-hosting algorithm will be used to accomplish fast recovery of the network through transferring switches under faulty domain to the rest of controllers which are still working normally. There is a possibility that the total remaining capacity of the normally working controllers is less than the number of switches under the faulty domain. In order to cope with this situation, the paper proposes to insert controllers dynamically.

Key words: Software Defined Network(SDN), in-band control, multi-controller fault tolerance, switch cluster partitioning, switch re-hosting

摘要: SDN提出将控制平面与转发平面解耦,并提供了对控制平面的可编程性以适应新的网络需求。集中化的控制器在带来诸多便利的同时,也伴随着许多新的挑战。其中,控制器单点故障就是一个不容忽视的问题。研究带内通信场景下多控制器容错与故障恢复。传统多控制器容错方案主要采用主从机制,某种程度上是对控制器资源的极大浪费。采用平面式的控制器架构,将多个控制器相连形成环状结构,相邻控制器间相互监听检测控制器故障。故障发生后,采用交换机簇划分和交换机重托管算法将故障域内的交换机托管到其余正常工作的控制器下,以完成网络的快速恢复。为应对网络中其余正常工作控制器总剩余容量小于故障域内交换机数量的极端情况,使用预定义的脚本动态添加控制器。

关键词: 软件定义网络, 带内通信, 多控制器容错, 交换机簇划分, 交换机重托管