计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (15): 178-188.DOI: 10.3778/j.issn.1002-8331.2410-0212

• 理论与研发 • 上一篇    下一篇

基于算力动态分配的稀疏卷积加速器

秦学毅,陈桂林,魏祥麟,于龙,范建华,刘恒   

  1. 1.南京信息工程大学 电子与信息工程学院,南京 210044
    2.国防科技大学 第六十三研究所,南京 210007
  • 出版日期:2025-08-01 发布日期:2025-07-31

Sparse Convolution Accelerator with Dynamic Allocation of Computing Power

QIN Xueyi, CHEN Guilin, WEI Xianglin, YU Long, FAN Jianhua, LIU Heng   

  1. 1.School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
    2.The 63rd Research Institute, National University of Defense Technology, Nanjing 210007, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 稀疏化卷积计算是降低卷积神经网络计算复杂度的重要手段。当前设计的稀疏化卷积加速器仍然面临两方面的问题:一是非零元素位置不固定导致索引逻辑复杂,索引计算时间长;二是简单地跳过零值元素导致计算资源闲置浪费。为解决这两个问题,设计了一种基于算力动态分配的稀疏卷积加速器。设计了一种动态非零值索引,降低了索引的计算时间和内存需求。提出了一种算力动态分配算法,将多个通道卷积跳零后的数据分配至一组乘法器,降低非零数据配对难度,避免资源闲置。在Xilinx XC7V2000平台上的仿真评估结果显示,在进行稀疏卷积计算时,所设计加速器的性能达438.3 GOPs,DSP效率达到了0.43 GOPs/DSP,与6种现有卷积加速器相比,DSP效率提升了1.26倍至2.86倍。

关键词: 稀疏卷积, 神经网络, 硬件加速, 多通道

Abstract: Sparse convolution is an important way to reduce computational complexity of convolutional neural networks. However, current sparse convolution accelerators still face two challenges. On the one hand, variable positions of non-zero elements complicate index logic, leading to long computation time. On the other hand, simply skipping zero elements may waste precious computing resources. To solve these problems, this paper proposes a sparse convolution accelerator with dynamic computing power allocation. A dynamic non-zero indexing method is designed to reduce computation time and memory requirements for indexing, improving process efficiency. A dynamic computing power allocation algorithm is proposed. This algorithm allocates post-zero-skipped data from multiple channels to a group of multipliers, simplifying the pairing of non-zero data and preventing resource idleness. Simulation results on Xilinx XC7V2000 platform show that the proposal achieves a performance of 438.3 GOPs, and a DSP efficiency of 0.43 GOPs/DSP, improving DSP efficiency by 1.26 times to 2.86 times compared with 6 existing convolution accelerators.

Key words: sparse convolution, neural network, hardware acceleration, multiple channels