基于算力动态分配的稀疏卷积加速器

doi:10.3778/j.issn.1002-8331.2410-0212

摘要/Abstract

摘要： 稀疏化卷积计算是降低卷积神经网络计算复杂度的重要手段。当前设计的稀疏化卷积加速器仍然面临两方面的问题：一是非零元素位置不固定导致索引逻辑复杂，索引计算时间长；二是简单地跳过零值元素导致计算资源闲置浪费。为解决这两个问题，设计了一种基于算力动态分配的稀疏卷积加速器。设计了一种动态非零值索引，降低了索引的计算时间和内存需求。提出了一种算力动态分配算法，将多个通道卷积跳零后的数据分配至一组乘法器，降低非零数据配对难度，避免资源闲置。在Xilinx XC7V2000平台上的仿真评估结果显示，在进行稀疏卷积计算时，所设计加速器的性能达438.3 GOPs，DSP效率达到了0.43 GOPs/DSP，与6种现有卷积加速器相比，DSP效率提升了1.26倍至2.86倍。

关键词: 稀疏卷积, 神经网络, 硬件加速, 多通道

Abstract: Sparse convolution is an important way to reduce computational complexity of convolutional neural networks. However, current sparse convolution accelerators still face two challenges. On the one hand, variable positions of non-zero elements complicate index logic, leading to long computation time. On the other hand, simply skipping zero elements may waste precious computing resources. To solve these problems, this paper proposes a sparse convolution accelerator with dynamic computing power allocation. A dynamic non-zero indexing method is designed to reduce computation time and memory requirements for indexing, improving process efficiency. A dynamic computing power allocation algorithm is proposed. This algorithm allocates post-zero-skipped data from multiple channels to a group of multipliers, simplifying the pairing of non-zero data and preventing resource idleness. Simulation results on Xilinx XC7V2000 platform show that the proposal achieves a performance of 438.3 GOPs, and a DSP efficiency of 0.43 GOPs/DSP, improving DSP efficiency by 1.26 times to 2.86 times compared with 6 existing convolution accelerators.

Key words: sparse convolution, neural network, hardware acceleration, multiple channels

秦学毅, 陈桂林, 魏祥麟, 于龙, 范建华, 刘恒. 基于算力动态分配的稀疏卷积加速器[J]. 计算机工程与应用, 2025, 61(15): 178-188.

QIN Xueyi, CHEN Guilin, WEI Xianglin, YU Long, FAN Jianhua, LIU Heng. Sparse Convolution Accelerator with Dynamic Allocation of Computing Power[J]. Computer Engineering and Applications, 2025, 61(15): 178-188.

参考文献

[1] ZHANG C C, ZHANG K, NI R D, et al. Unleashing the potential of machine learning: an exploration of state-of-the-art algorithms and real-world applications in computer vision[C]//Proceedings of the 2023 Congress in Computer Science, Computer Engineering, & Applied Computing. Piscataway: IEEE, 2023: 422-425.
[2] YUAN T H, ZHANG P C, JIN H Y, et al. CNN-DBN: quality assessment and optimization of content-based image retrieval services[C]//Proceedings of the 2021 IEEE 12th International Conference on Software Engineering and Service Science. Piscataway: IEEE, 2021: 154-157.
[3] 苗博瑞, 许云峰, 赵少杰, 等. C-BGA: 结合对比学习的多模态语音情感识别网络[J]. 计算机工程与应用, 2024, 60(16): 168-176.
MIAO B R, XU Y F, ZHAO S J, et al. C-BGA: multimodal speech emotion recognition network combining contrastive lear-ning[J]. Computer Engineering and Applications, 2024, 60(16): 168-176.
[4] RESHMA R, JOSE ANAND A. Predictive and comparative analysis of LENET, ALEXNET and VGG-16 network architecture in smart behavior monitoring[C]//Proceedings of the 2023 Seventh International Conference on Image Information Processing. Piscataway: IEEE, 2023: 450-453.
[5] 黄英来, 姜忠良. 改进残差网络甜瓜叶片病害的识别研究[J]. 计算机工程与应用, 2024, 60(15): 189-197.
HUANG Y L, JIANG Z L. Research on identification of melon leaf diseases with improved residual network[J]. Computer Engineering and Applications, 2024, 60(15): 189-197.
[6] KHOT A. Image analysis using convolutional neural network to detect bird species[C]//Proceedings of the 7th International Conference on Computing in Engineering & Technology (ICCET 2022), 2022: 58-61.
[7] MATHKUNTI N M, ANANTHANAGU U, P M E. Brain disease Parkinson’s diagnosis using VGG-16 and VGG-19 with spiral and waves drawings as input[C]//Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT). Piscataway: IEEE, 2024: 1-5.
[8] 郭朝鹏, 王馨昕, 仲昭晋, 等. 能耗优化的神经网络轻量化方法研究进展[J]. 计算机学报, 2023, 46(1): 85-102.
GUO C P, WANG X X, ZHONG Z J, et al. Research advance on neural network lightweight for energy optimization[J]. Chinese Journal of Computers, 2023, 46(1): 85-102.
[9] LEE W H, ROH S D, PARK S, et al. Direct conversion: accelerating convolutional neural networks utilizing sparse input activation[C]//Proceedings of the IECON 2020 the 46th Annual Conference of the IEEE Industrial Electronics Society. Piscataway: IEEE, 2020: 441-446.
[10] ALBERICIO J, JUDD P, HETHERINGTON T, et al. Cnvlutin: ineffectual-neuron-free deep neural network computing[C]//Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture. Piscataway: IEEE, 2016: 1-13.
[11] SHAO H, LIU B, QIAN Y M. One-shot sensitivity-aware mixed sparsity pruning for large language models[C]//Proceedings of the ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2024: 11296-11300.
[12] LIU B S, CHEN X M, HAN Y H, et al. Search-free accelerator for sparse convolutional neural networks[C]//Proceedings of the 2020 25th Asia and South Pacific Design Automation Conference. Piscataway: IEEE, 2020: 524-529.
[13] ZHANG S J, DU Z D, ZHANG L, et al. Cambricon-X: an accelerator for sparse neural networks[C]//Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE, 2016: 1-12.
[14] HAN S, LIU X Y, MAO H Z, et al. EIE: efficient inference engine on compressed deep neural network[C]//Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture. Piscataway: IEEE, 2016: 243-254.
[15] PARASHAR A, RHU M, MUKKARA A, et al. SCNN: an accelerator for compressed-sparse convolutional neural networks[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture. New York: ACM, 2017: 27-40.
[16] LU L Q, XIE J M, HUANG R R, et al. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs[C]//Proceedings of the 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines. Piscataway: IEEE, 2019: 17-25.
[17] GUY Y, WANGY Y, ADEBISIZ B, et al. Blind signal recognition method of STBC based on multi-channel convolutional neural network[C]//Proceedings of the 2022 IEEE 96th Vehicular Technology Conference. Piscataway: IEEE, 2022: 1-5.
[18] SRIDHARAN A, ZHANG F, SEO J S, et al. SP-IMC: a sparsity aware in-memory-computing macro in 28 nm CMOS with configurable sparse representation for highly sparse DNN workloads[C]//Proceedings of the 2024 IEEE Custom Integrated Circuits Conference. Piscataway: IEEE, 2024: 1-2.
[19] 涂坤, 熊凤超, 傅冠夷蛮, 等. 多任务的高光谱图像卷积稀疏编码去噪网络[J]. 中国图象图形学报, 2024, 29(1): 280-292.
TU K, XIONG F C, FU G Y M, et al. Multitask hyperspectral image convolutional sparse coding-denoising network[J]. Journal of Image and Graphics, 2024, 29(1): 280-292.
[20] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856.
[21] ZHU C Y, HUANG K J, YANG S Y, et al. An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020, 28(9): 1953-1965.
[22] 黄沛昱, 赵强, 李煜龙. 基于FPGA的卷积神经网络硬件加速器设计[J]. 计算机应用与软件, 2023, 40(3): 38-44.
HUANG P Y, ZHAO Q, LI Y L. Design of FPGA-based convolutional neural network hardware accelerator[J]. Computer Applications and Software, 2023, 40(3): 38-44.
[23] KALA S, JOSE B R, MATHEW J, et al. High-performance CNN accelerator on FPGA using unified winograd-GEMM architecture[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27(12): 2816-2828.
[24] 邱臻博. 一种基于FPGA的CNN硬件加速器实现[J]. 电子技术应用, 2023, 49(12): 20-25.
QIU Z B. An FPGA-based implementation of CNN hardware accelerator[J]. Application of Electronic Technique, 2023, 49(12): 20-25.
[25] MA Y F, CAO Y, VRUDHULA S, et al. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York: ACM, 2017: 45-54.
[26] XIAO T, TAO M. Research on FPGA based convolutional neural network acceleration method[C]//Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Computer Applications. Piscataway: IEEE, 2021: 289-292.
[27] ZHANG J W, XU F, LI J H. A high-performance hardware accelerator for sparse convolutional neural network on FPGA[C]//Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications. Piscataway: IEEE, 2022: 1143-1149.
[28] ZHANG C, SUN G Y, FANG Z M, et al. Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(11): 2072-2085.
[29] YOU W J, WU C. RSNN: a software/hardware co-optimized framework for sparse convolutional neural networks on FPGAs[J]. IEEE Access, 2020, 9: 949-960.