Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator

doi:10.3778/j.issn.1002-8331.1912-0384

Abstract

Abstract:

The rapid development of computer vision requires higher and higher system performance of embedded products, traditional Field Programmable Gate Array（FPGA） platform has some problems that computational throughput does not match the memory bandwidth well, the implementation efficiency of general processor pair Convolutional Neural Network（CNN） is not high, and the performance requirements are not met. Aiming at above design bottlenecks, using the classic LeNet-5 neural network model, a high-performance face recognition neural network accelerator is designed on the Xilinx ZC706 embedded development platform, which is optimized by storage based on High Level Synthesis（HLS） tools. The fixed-point quantization, computational optimization and other aspects of the neural network model are optimized and improved, and the 7-layer CNN accelerator is realized. Experimental results show that the operating frequency of CNN accelerator is 200 MHz. Compared with the CPU, the accelerator achieves 126 times acceleration, which is more than ten times faster than the GPU speed, and the power consumption is only 2.62 W.

Key words: CNN accelerator, Field Programmable Gate Array（FPGA）, High Level Synthesis（HLS）, storage optimization, fixed point quantization

摘要：

计算机视觉的快速发展对嵌入式产品的系统性能要求越来越高，传统的现场可编程门阵列（Field Programmable Gate Array，FPGA）平台存在计算吞吐未能很好匹配内存带宽，通用处理器对卷积神经网络（Convolutional Neural Network，CNN）的实现效率不高，未能满足性能要求等问题。针对以上设计瓶颈，使用经典的LeNet-5神经网络模型，在Xilinx ZC706嵌入式开发平台上设计了一个高性能的人脸识别神经网络加速器，在高层次综合（High Level Synthesis，HLS）工具的基础上通过存储优化、定点量化、运算优化等方法对神经网络模型进行优化改进，实现了7层的CNN加速器。实验结果表明，CNN加速器的工作频率为200 MHz，相较于CPU，加速器实现了126倍加速，相较于GPU速度提升10倍以上，并且功耗仅为2.62 W。

关键词: CNN加速器, 现场可编程门阵列（FPGA）, 高层次综合（HLS）, 存储优化, 定点量化

WU Jin, ZHANG Weihua, XI Meng, DAI Wei. Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator[J]. Computer Engineering and Applications, 2020, 56(22): 48-54.

吴进，张伟华，席萌，代巍. 高性能人脸识别加速器优化设计及FPGA实现[J]. 计算机工程与应用, 2020, 56(22): 48-54.

[1]	LENG Ming, SUN Lingyu, GUO Chen. Forward Circuit Generation Algorithm of XDL Netlist [J]. Computer Engineering and Applications, 2021, 57(10): 75-80.
[2]	WU Yiyang, FAN Fan, ZHOU Yi, HUANG Jun. FPGA Implementation of Affine Transformation Based on Pre-interpolation [J]. Computer Engineering and Applications, 2020, 56(6): 224-230.
[3]	ZHANG Wei, LIU Yuhong, ZHANG Rongfen. Design of IP Cores for CNN Convolutional Layer and Pooling Layer Capable of Time Division Multiplexing [J]. Computer Engineering and Applications, 2020, 56(24): 66-71.
[4]	WANG Fan, ZHOU Guoqing, ZHANG Rongting, LIU Dequan. FPGA-Oriented Fast Connected Component Labeling Method [J]. Computer Engineering and Applications, 2020, 56(22): 230-235.
[5]	LI Zenggang, WANG Zhengyan, SUN Jingcheng. Research and Design of Handwritten Digital BP Neural Network Based on FPGA [J]. Computer Engineering and Applications, 2020, 56(17): 251-257.
[6]	LI Rengang, REN Zhixin, WANG Jiangwei, KAN Hongwei, ZHANG Chuang, GONG Weifeng. Design and Implementation of Memory Data Protection Technology Based on FPGA [J]. Computer Engineering and Applications, 2020, 56(13): 72-76.
[7]	SUN Jingcheng, WANG Zhengyan, LI Zenggang. FPGA Implementation of Convolution Neural Network Digital Recognition System [J]. Computer Engineering and Applications, 2020, 56(13): 181-188.
[8]	WANG Haiyu, XIE Lili, WANG Shan. Neural Network Sinusoidal Signal Generator [J]. Computer Engineering and Applications, 2019, 55(16): 259-264.
[9]	WANG Weiting1，2, LI Jinjie2, ZHANG Wenxu1. Research of implementing DDS without phase truncation spur based on phase code compensation [J]. Computer Engineering and Applications, 2017, 53(4): 244-250.
[10]	FENG Binbin1, JIANG Xinhua1，2, LIN Junjie2, NIE Mingxing2. Research and implementation of real-time semi global matching algorithm based on FPGA stereo vision [J]. Computer Engineering and Applications, 2017, 53(22): 163-168.
[11]	ZHANG Xiaonan1, GAO Xianwei1，2, DONG Xiuze2. Improvement and implementation of carry-save large numbers multiplication on FPGA [J]. Computer Engineering and Applications, 2017, 53(21): 58-61.
[12]	LI Yan, CUI Haoxin, DU Yongbin. Hardware implementation of two-level scheduling algorithm of μC/OS-II [J]. Computer Engineering and Applications, 2016, 52(12): 1-4.
[13]	YANG Dawei, ZHAN Zhenqiang, LI Encheng. Scheme of vehicle infrared night vision projection system [J]. Computer Engineering and Applications, 2015, 51(8): 153-155.
[14]	FANG Rui, LIU Jiahe, XUE Zhihui, YANG Guangwen. FPGA-based design for convolution neural network [J]. Computer Engineering and Applications, 2015, 51(8): 32-36.
[15]	MA Jichao, ZHANG Shengbing, ZHANG Meng, HAO Zhijuan. C-RAN front-end processing design and implement [J]. Computer Engineering and Applications, 2015, 51(6): 42-47.

Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator

高性能人脸识别加速器优化设计及FPGA实现

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics