[1] UMUROGLU Y, RASNAYAKE L, SJ?LANDER M. Bismo: a scalable bit-serial matrix multiplication overlay for reconfigurable computing[C]//2018 28th International Conference on Field Programmable Logic and Applications (FPL), 2018.
[2] RYU S, KIM H, YI W, et al. Bitblade: area and energy-efficient precision-scalable neural network accelerator with bitwise summation[C]//Proceedings of the 56th Annual Design Automation Conference, 2019: 1-6.
[3] YANG Q, LI H. BitSystolic: a 26.7?TOPS/W 2b~ 8b NPU with configurable data flows for edge devices[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 68(3): 1134-1145.
[4] SHARMA H, PARK J, SUDA N, et al. Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network[C]//2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018: 764-775.
[5] JUDD P, ALBERICIO J, HETHERINGTON T, et al. Stripes: bit?serial deep neural network computing[C]//2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016: 1-12.
[6] GHOLAMI A, KIM S, DONG Z, et al. A survey of quantization methods for efficient neural network inference[J]. arXiv:2103.13630, 2021.
[7] PARASHAR A, RAINA P, SHAO Y S, et al. Timeloop: a systematic approach to DNN accelerator evaluation[C]//2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019: 304-315.
[8] KWON H, CHATARASI P, PELLAUER M, et al. Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach[C]//Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019: 754-768.
[9] KWON H, CHATARASI P, SARKAR V, et al. Maestro: a data-centric approach to understand reuse, performance, and hardware cost of dnn mappings[J]. IEEE Micro, 2020, 40(3): 20-29.
[10] LU L, GUAN N, WANG Y, et al. Tenet: a framework for modeling tensor dataflow based on relation-centric notation[C]//2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021: 720-733.
[11] IBRAHIM E M, MEI L, VERHELST M. Taxonomy and benchmarking of precision-scalable MAC arrays under enhanced DNN dataflow representation[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(5): 2013-2024.
[12] CHEN Y, LUO T, LIU S, et al. Dadiannao: a machine-learning supercomputer[C]//2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014: 609-622.
[13] LI S, CHEN K, AHN J H, et al. CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques[C]//2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2011: 694-701.
[14] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[15] NETZER Y, WANG T, COATES A, et al. Reading digits in natural images with unsupervised feature learning[C]//NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[17] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[18] KRIZHEVSKY A, HINTON G.Learning multiple layers of features from tiny images[J].Handbook of Systemic Autoimmune Diseases, 2009, 1(4).
[19] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014.
[20] LIPTON Z C, BERKOWITZ J, ELKAN C. A critical review of recurrent neural networks for sequence learning[J]. arXiv:1506.00019, 2015.
[21] HUBARA I, COURBARIAUX M, SOUDRY D, et al. Quantized neural networks: training neural networks with low precision weights and activations[J]. The Journal of Machine Learning Research, 2017, 18(1): 6869-6898.
[22] MISHRA A, NURVITADHI E, COOK J J, et al. WRPN: wide reduced-precision networks[J]. arXiv:1709.01134, 2017.
[23] GHODRATI S, SHARMA H, YOUNG C, et al. Bit-parallel vector composability for neural acceleration[C]//2020 57th ACM/IEEE Design Automation Conference (DAC), 2020: 1-6. |