[1] ANANDKUMAR A, GE R, HSU D, et al. Tensor decompositions for learning latent variable models[J]. Journal of Machine Learning Research, 2014, 15: 2773-2832.
[2] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[3] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[4] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communication of the ACM, 2012, 60: 84-90.
[5] JOUPPI N P, YOUNG C, PATIL N, et al. In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017: 1-12.
[6] LIU S, DU Z, TAO J, et al. Cambricon: an instruction set architecture for neural networks[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 393-405.
[7] ZHAO Y, DU Z, GUO Q, et al. Cambricon-F: machine learning computers with fractal von Neumann architecture[C]//Proceedings of the 46th International Symposium on Computer Architecture, 2019: 788-801.
[8] CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2016, 52(1): 127-138.
[9] LIAO H, TU J, XIA J, et al. Ascend: a scalable and unified architecture for ubiquitous deep neural network computing: industry track paper[C]//Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture, 2021: 789-801.
[10] VENKATESAN R, SHAO Y S, WANG M, et al. MAGNet: a modular accelerator generator for neural networks[C]//Proceedings of the 2019 IEEE/ACM International Conference on Computer-Aided Design, 2019: 1-8.
[11] LAI Y H, RONG H, ZHENG S, et al. SuSy: a programming model for productive construction of high-performance systolic arrays on FPGAs[C]//Proceedings of the 39th International Conference on Computer-Aided Design, 2020: 1-9.
[12] GENC H, KIM S, AMID A, et al. Gemmini: enabling systematic deep-learning architecture evaluation via full-stack integration[C]//Proceedings of the 2021 58th ACM/IEEE Design Automation Conference, 2021: 769-774.
[13] WANG J, GUO L, CONG J. AutoSA: a polyhedral compiler for high-performance systolic arrays on FPGA[C]//Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021: 93-104.
[14] SAMAJDAR A, JOSEPH J M, ZHU Y, et al. A systematic methodology for characterizing scalability of DNN accelerators using SCALE-Sim[C]//Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software, 2020: 58-68.
[15] MUNOZ-MARTINEZ F, ABELLAN J L, ACACIO M E, et al. STONNE: enabling cycle-level microarchitectural simulation for DNN inference accelerators[C]//Proceedings of the 2021 IEEE International Symposium on Workload Characterization, 2021: 201-213.
[16] PARASHAR A, RAINA P, SHAO Y S, et al. Timeloop: a systematic approach to DNN accelerator evaluation[C]//Proceedings of the 2019 IEEE International Symposium on Performance Analysis of Systems and Software, 2019: 304-315.
[17] YANG X, GAO M, LIU Q, et al. Interstellar: using halide’s scheduling language to analyze DNN accelerators[C]//Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems, 2020: 369-383.
[18] HUANG Q, KANG M, DINH G, et al. CoSA: scheduling by constrained optimization for spatial accelerators[C]//Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, 2021: 554-566.
[19] KWON H, CHATARASI P, PELLAUER M, et al. Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach[C]//Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019: 754-768.
[20] LU L, GUAN N, WANG Y, et al. TENET: a framework for modeling tensor dataflow based on relation-centric notation[C]//Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, 2021: 720-733.
[21] 钱佳明, 娄文启, 宫磊, 等. 面向3D-CNN的算法压缩-硬件设计协同优化[J]. 计算机工程与应用, 2023, 59(18): 74-83.
QIAN J M, LOU W Q, GONG L, et al. Algorithm compression and hardware design co-optimization for 3D-CNN[J]. Computer Engineering and Applications, 2023, 59(18): 74-83.
[22] 陈云霁, 李玲, 李威, 等. 智能计算系统[M]. 北京: 机械工业出版社, 2020: 234-238.
CHEN Y J, LI L, LI W, et al. AI computing systems[M]. Beijing: China Machine Press, 2020: 234-238.
[23] MARCHISIO A, HANIF M A, SHAFIQUE M. CapsAcc: an efficient hardware accelerator for capsuleNets with data reuse[C]//Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition, 2019: 964-967.
[24] MARCHISIO A, HANIF M A, TEIMOORI M T, et al. Capstore: energy-efficient design and management of the on-chip memory for capsulenet inference accelerators[J]. arXiv:1902.
01151, 2019.
[25] MARCHISIO A, MRAZEK V, HANIF M A, et al. DESCNet: developing efficient scratchpad memories for capsule network hardware[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 40(9): 1768-1781.
[26] MOONS B, VERHELST M. A 0. 3-2. 6 TOPS/W precision-scalable processor for real-time large-scale ConvNets[C]//Proceedings of the 2016 IEEE Symposium on VLSI Circuits, 2016: 1-2.
[27] YIN S, OUYANG P, TANG S, et al. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications[J]. IEEE Journal of Solid-State Circuits, 2017, 53(4): 968-982.
[28] GENC H, HAJ-ALI A, IYER V, et al. Gemmini: an agile systolic array generator enabling systematic evaluations of deep-learning architectures[J]. arXiv:1911.09925, 2019.
[29] PARASHAR A, RHU M, MUKKARA A, et al. SCNN: an accelerator for compressed-sparse convolutional neural networks[J]. ACM SIGARCH Computer Architecture News, 2017, 45(2): 27-40.
[30] DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: shifting vision processing closer to the sensor[C]//Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015: 92-104.
[31] SIJSTERMANS F. The NVIDIA deep learning accelerator[J]. Hot Chips, 2018, 30: 19-21.
[32] WANG C, GONG L, YU Q, et al. DLAU: a scalable deep learning accelerator unit on FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2016, 36(3): 513-517.
[33] GONG L, WANG C, LI X, et al. MALOC: a fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2601-2612. |