计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (22): 323-334.DOI: 10.3778/j.issn.1002-8331.2307-0257

• 工程与应用 • 上一篇    下一篇

FPGA平台上动态硬件重构的Winograd神经网络加速器

梅冰笑,滕文彬,张弛,王文浩,李富强,苑福利   

  1. 1.国网浙江省电力有限公司 电力科学研究院,杭州 310014
    2.中国科学技术大学 计算机科学与技术学院,合肥 230027
    3.中国科学技术大学 苏州高等研究院,江苏 苏州 215123
    4.国网浙江省电力有限公司,杭州 310014
    5.国网浙江省电力有限公司 宁波供电公司,浙江 宁波 315000
  • 出版日期:2024-11-15 发布日期:2024-11-14

Winograd Neural Network Accelerator Using Dynamic Hardware Reconfiguration on FPGA Platform

MEI Bingxiao, TENG Wenbin, ZHANG Chi, WANG Wenhao, LI Fuqiang, YUAN Fuli   

  1. 1.Electric Power Research Institute, State Grid Zhejiang Electric Power Co., Ltd., Hangzhou 310014, China
    2.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
    3.Suzhou Institution for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu 215123, China
    4.State Grid Zhejiang Electric Power Co., Ltd., Hangzhou 310014, China
    5.Ningbo Power Supply Company, State Grid Zhejiang Electric Power Co., Ltd., Ningbo, Zhejiang 315000, China
  • Online:2024-11-15 Published:2024-11-14

摘要: 为解决卷积神经网络在FPGA平台上进行硬件加速时存在的资源利用率低和资源受限问题,提出了一种基于FPGA动态部分重构技术和Winograd快速卷积的卷积神经网络加速器。该加速器通过运行时硬件重构对FPGA片上资源进行时分复用,采用流水线方式动态地将各个计算流水段配置到FPGA,各个流水段所对应的卷积计算核心使用Winograd算法进行定制优化,以在解决资源受限问题的同时最大程度地提升计算资源利用效率。针对该加速器架构,进一步构建了组合优化模型,用于搜索在特定FPGA硬件平台上部署特定网络模型的最优并行策略,并使用遗传算法进行设计空间求解。基于Xilinx VC709 FPGA平台对VGG-16网络模型进行部署和分析,综合仿真结果表明,所提出的设计方法能够在资源有限的FPGA上自适应地实现大型神经网络模型,加速器整体性能可以达到1?078.3?GOPS,较以往加速器的性能和计算资源利用效率可以分别提升2.2倍和3.62倍。

关键词: 卷积神经网络, 动态部分硬件重构, 现场可编程门阵列(FPGA), 硬件加速器, Winograd快速卷积

Abstract: To address the low resource utilization and resource-restricted problems of convolutional neural networks (CNNs) in FPGA-based hardware acceleration, this paper proposes a convolutional neural network accelerator based on FPGA dynamic partial reconfiguration technique and Winograd fast convolution. The accelerator multiplexes FPGA resources in runtime and dynamically configures various calculation pipelines to the FPGA using a pipeline method. The convolutional computation cores corresponding to each pipeline segment are optimized using Winograd algorithm customization to maximize the utilization of computing resources while solving the resource limitation problem. For the proposed accelerator architecture, this paper further establishes a combinatorial optimization model to search for the optimal parallel strategy to deploy a specific network model on a particular FPGA hardware platform, working with genetic algorithm for exploring the design space. Based on the Xilinx VC709 FPGA platform, the VGG-16 network model is deployed and analyzed. The comprehensive simulation results show that large-scale neural network models can be adaptively implemented on resource-limited FPGAs. The overall performance of the accelerator can reach 1?078.3 GOPS, which is 2.2 times and 3.62 times better than the performance and computing resource utilization efficiency of previous accelerators, respectively.

Key words: convolutional neural network, dynamic partial hardware reconfiguration, field programmable gate array (FPGA), hardware accelerator, Winograd fast convolution