高性能人脸识别加速器优化设计及FPGA实现

doi:10.3778/j.issn.1002-8331.1912-0384

计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (22): 48-54.DOI: 10.3778/j.issn.1002-8331.1912-0384

高性能人脸识别加速器优化设计及FPGA实现

吴进，张伟华，席萌，代巍

西安邮电大学电子工程学院，西安 710121

出版日期:2020-11-15 发布日期:2020-11-13

Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator

WU Jin, ZHANG Weihua, XI Meng, DAI Wei

School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

Online:2020-11-15 Published:2020-11-13

摘要/Abstract

摘要：

计算机视觉的快速发展对嵌入式产品的系统性能要求越来越高，传统的现场可编程门阵列（Field Programmable Gate Array，FPGA）平台存在计算吞吐未能很好匹配内存带宽，通用处理器对卷积神经网络（Convolutional Neural Network，CNN）的实现效率不高，未能满足性能要求等问题。针对以上设计瓶颈，使用经典的LeNet-5神经网络模型，在Xilinx ZC706嵌入式开发平台上设计了一个高性能的人脸识别神经网络加速器，在高层次综合（High Level Synthesis，HLS）工具的基础上通过存储优化、定点量化、运算优化等方法对神经网络模型进行优化改进，实现了7层的CNN加速器。实验结果表明，CNN加速器的工作频率为200 MHz，相较于CPU，加速器实现了126倍加速，相较于GPU速度提升10倍以上，并且功耗仅为2.62 W。

关键词: CNN加速器, 现场可编程门阵列（FPGA）, 高层次综合（HLS）, 存储优化, 定点量化

Abstract:

The rapid development of computer vision requires higher and higher system performance of embedded products, traditional Field Programmable Gate Array（FPGA） platform has some problems that computational throughput does not match the memory bandwidth well, the implementation efficiency of general processor pair Convolutional Neural Network（CNN） is not high, and the performance requirements are not met. Aiming at above design bottlenecks, using the classic LeNet-5 neural network model, a high-performance face recognition neural network accelerator is designed on the Xilinx ZC706 embedded development platform, which is optimized by storage based on High Level Synthesis（HLS） tools. The fixed-point quantization, computational optimization and other aspects of the neural network model are optimized and improved, and the 7-layer CNN accelerator is realized. Experimental results show that the operating frequency of CNN accelerator is 200 MHz. Compared with the CPU, the accelerator achieves 126 times acceleration, which is more than ten times faster than the GPU speed, and the power consumption is only 2.62 W.

Key words: CNN accelerator, Field Programmable Gate Array（FPGA）, High Level Synthesis（HLS）, storage optimization, fixed point quantization

吴进，张伟华，席萌，代巍. 高性能人脸识别加速器优化设计及FPGA实现[J]. 计算机工程与应用, 2020, 56(22): 48-54.

WU Jin, ZHANG Weihua, XI Meng, DAI Wei. Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator[J]. Computer Engineering and Applications, 2020, 56(22): 48-54.

[1]	冷明，孙凌宇，郭晨. XDL网表的前向电路图生成算法[J]. 计算机工程与应用, 2021, 57(10): 75-80.
[2]	吴艺阳，樊凡，周怡，黄珺. 插值前置的仿射变换FPGA实现方法[J]. 计算机工程与应用, 2020, 56(6): 224-230.
[3]	王美乐，张治中，席兵. LTE-A空口监测仪下行基带板的可行性分析[J]. 计算机工程与应用, 2020, 56(4): 268-273.
[4]	王凡，周国清，张荣庭，刘德全. 面向FPGA的连通域快速标记方法[J]. 计算机工程与应用, 2020, 56(22): 230-235.
[5]	张孝，孙一铭，吴旭峰. 查询感知的关系-图数据库自适应存储技术研究[J]. 计算机工程与应用, 2020, 56(17): 100-108.
[6]	李增刚，王正彦，孙敬成. 基于FPGA的手写数字BP神经网络研究与设计[J]. 计算机工程与应用, 2020, 56(17): 251-257.
[7]	李仁刚，任智新，王江为，阚宏伟，张闯，公维锋. 基于FPGA内存数据保护技术的设计与实现[J]. 计算机工程与应用, 2020, 56(13): 72-76.
[8]	孙敬成，王正彦，李增刚. 卷积神经网络数字识别系统的FPGA实现[J]. 计算机工程与应用, 2020, 56(13): 181-188.
[9]	李炽阳，雷倩倩，杨延飞. 全通用AES加密算法的FPGA实现[J]. 计算机工程与应用, 2020, 56(10): 83-87.
[10]	王海宇，谢利理，王杉. 神经网络正弦信号发生器[J]. 计算机工程与应用, 2019, 55(16): 259-264.
[11]	彭福来，于治楼，陈乃阔，耿士华，李凯一. 面向国产CPU的可重构计算系统设计及性能探究[J]. 计算机工程与应用, 2018, 54(23): 36-41.
[12]	王炜珽1，2，李进杰2，张文旭1. 相位码补偿法实现DDS无相位截断杂散的研究[J]. 计算机工程与应用, 2017, 53(4): 244-250.
[13]	冯彬彬1，蒋新华1，2，林俊杰2，聂明星2. 基于FPGA的实时SGM匹配算法研究与实现[J]. 计算机工程与应用, 2017, 53(22): 163-168.
[14]	王凯，施隆照. 基于FPGA的快速连通区域标记算法的设计与实现[J]. 计算机工程与应用, 2016, 52(18): 192-198.
[15]	刘俊杰，师剑军，张大江，周瑞钊. SM4算法在无线通信中的硬件实现与应用[J]. 计算机工程与应用, 2016, 52(17): 118-122.

高性能人脸识别加速器优化设计及FPGA实现

Optimized Design and FPGA Implementation of High-Performance Face Recognition Accelerator

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics