计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (14): 163-175.DOI: 10.3778/j.issn.1002-8331.2409-0250

• 理论与研发 • 上一篇    下一篇

高性能Benes网络路由求解算法及硬件加速器

秦梦远,刘宏伟,郝沁汾   

  1. 1.中国科学院 计算技术研究所,北京 100191
    2.中国科学院大学,北京 101408
  • 出版日期:2025-07-15 发布日期:2025-07-15

High-Performance Route-Resolving Algorithm and Hardware Accelerator for Benes Network

QIN Mengyuan, LIU Hongwei, HAO Qinfen   

  1. 1.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100191, China
    2.University of Chinese Academy of Sciences, Beijing 101408, China
  • Online:2025-07-15 Published:2025-07-15

摘要: 光互连网络使用光交叉开关阵列实现光交换。大规模快速光交叉开关阵列多使用紧凑的Benes网络构建,以避免级联过多开关点导致较高物理链路损耗。但对Benes网络的路由求解将引入数百纳秒至数毫秒的开销,产生交换性能瓶颈。为降低此开销并消除性能瓶颈,提出一种利于高性能硬件实现的Benes网络完全重排求解算法,改进了传统Benes网络求解算法的求解次序,提高了并行度。提出基于该算法的硬件加速器,具有优良的频率特性,FPGA版本加速器固定耗时26?ns可完成一次16×16?Benes网络重构求解。通过流水线优化,将连续求解吞吐量提升至700?MOPs。相比现有同类路由求解算法的FPGA实现,其求解速度提升9.85倍,连续求解吞吐量提升2.8倍。若使用ASIC技术而非FPGA构建加速器芯片,预计可将求解耗时降低至与开关阵列重构耗时相仿的水平,彻底消除性能瓶颈。

关键词: Benes网络, 重排求解算法, 并行化, 硬件加速器, FPGA

Abstract: Optical interconnect networks use optical cross-connects (OXC) to enable optical switching. Large-scale fast optical cross-connects tend to apply a compact Benes network to avoid high link loss brought by concatenating too many optical switching points. However, Benes network also brings heavy latency overhead for calculating corresponding statuses of its internal switch points to satisfy given route requests, which usually takes hundreds of nanoseconds to microseconds, thus causing a performance bottleneck. This paper proposes a route-solving algorithm of Benes network to decrease such overhead. The algorithm is designed for high-performance hardware implementation. It re-orders the solving procedure to achieve a higher degree of parallelism, and higher working frequency is guaranteed by its simple design. The FPGA-version of the algorithm only uses 26 ns to solve a route-resolving problem of a 16×16 Benes network. With optimized pipeline, its route-solving throughput is increased to 700 MOPs. The route-resolving speed is 9.85 times faster than other FPGA-versions based on the same kind of algorithm, and the route-resolving throughput is 2.8 times bigger. If ASIC technique is applied to build the accelerator chip instead of FPGA, route-resolving time will match the reconfigure latency of high-speed OXC, thus eliminating the performance bottleneck.

Key words: Benes network, permutation algorithm, parallelization, hardware accelerator, FPGA