[1] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015: 1026-1034.
[2] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[3] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[4] NANE R, SIMA V M, PILATO C, et al. A survey and evaluation of FPGA high-level synthesis tools[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015, 35(10): 1591-1604.
[5] NURVITADHI E, VENKATESH G, SIM J, et al. Can FPGAs beat GPUs in accelerating next-generation deep neural networks?[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017: 5-14.
[6] ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015: 161-170.
[7] ZHANG C, SUN G, FANG Z, et al. Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 38(11): 2072-2085.
[8] MA Y, CAO Y, VRUDHULA S, et al. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017: 45-54.
[9] CHEN T, DU Z, SUN N, et al. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning[J]. ACM SIGARCH Computer Architecture News, 2014, 42(1): 269-284.
[10] LIU S, DU Z, TAO J, et al. Cambricon: an instruction set architecture for neural networks[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 393-405.
[11] XIAO Q, LIANG Y, LU L, et al. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs[C]//Proceedings of the 54th Annual Design Automation Conference, 2017: 1-6.
[12] GONG L, WANG C, LI X, et al. MALOC: a fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2601-2612.
[13] GONG L, WANG C, LI X, et al. Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks[C]//Proceedings of the 2017 International Conference on Hardware/Software Codesign and System Synthesis, 2017: 1-2.
[14] VIPIN K, FAHMY S A. FPGA dynamic and partial reconfiguration: a survey of architectures, methods, and applications[J]. ACM Computing Surveys, 2018, 51(4): 1-39.
[15] ANSARI A, GUNNAM K, OGUNFUNMI T. An efficient reconfigurable hardware accelerator for convolutional neural networks[C]//Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, 2017: 1337-1341.
[16] VENIERIS S I, BOUGANIS C S. fpgaConvNet: a framework for mapping convolutional neural networks on FPGAs[C]//Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines, 2016: 40-47.
[17] LAVIN A, GRAY S. Fast algorithms for convolutional neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4013-4021.
[18] GONG L, WANG C, LI X, et al. Improving HW/SW adaptability for accelerating CNNs on FPGAs through a dynamic/static co-reconfiguration approach[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 32(7): 1854-1865.
[19] 苑福利, 宫磊, 娄文启, 等. 动态重构硬件加速中的性能开销建模[J]. 计算机工程与应用, 2022, 58(6): 69-79.
YUAN F L, GONG L, LOU W Q, et al. Performance cost modeling in dynamic reconfiguration hardware acceleration[J]. Computer Engineering and Applications, 2022, 58(6): 69-79. |