Computer Engineering and Applications ›› 2021, Vol. 57 ›› Issue (4): 252-257.DOI: 10.3778/j.issn.1002-8331.1912-0099

Previous Articles     Next Articles

Design of Hardware Accelerator for Embedded Convolutional Neural Network

TANG Rui, JIAO Jiye, XU Huahao   

  1. School of Computer Science & Technology, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
  • Online:2021-02-15 Published:2021-02-06



  1. 西安邮电大学 计算机学院,西安 710121


In recent years, neural network models become more and more complex. Aiming at the large memory space required for convolutional neural network inference calculations, which limits its deployment on embedded devices, a dynamic multi-precision fixed-point data quantization hardware structure is proposed. It uses fixed-point data instead of floating-point data during neural network inference to perform convolutional operations. The results show that compared with the static quantization strategy, using a 16 bit fixed-point dynamic quantization and parallel convolutional operation hardware architecture, data accuracy is up to 97.96%. The hardware unit area is only 13740 gates, and the memory footprint and bandwidth requirement are reduced 50%. In addition, compared with Cortex M4, which performs convolutional operations using floating-point data, the embedded system SoC performance is improved more than 90%.

Key words: convolutional neural network, embedded devices, dynamic multi-precision fixed-point data quantization, parallel convolutional operation hardware architecture


近年来,随着神经网络模型越来越复杂,针对卷积神经网络推理计算所需内存空间过大,限制其在嵌入式设备上部署的问题,提出一种动态多精度定点数据量化硬件结构,使用定点数代替训练后推理过程中的浮点数执行卷积运算。结果表明,采用16位动态定点量化和并行卷积运算硬件架构,与静态量化策略相比,数据准确率高达97.96%,硬件单元的面积仅为13 740门,且内存占用量和带宽需求减半。相比Cortex M4使用浮点数据做卷积运算,该硬件加速单元性能提升了90%以上。

关键词: 卷积神经网络, 嵌入式设备, 动态多精度定点数据量化, 并行卷积运算硬件架构