计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (18): 147-157.DOI: 10.3778/j.issn.1002-8331.2312-0136

• 理论与研发 • 上一篇    下一篇

DNN在位级可组合架构上的数据流优化方法

高汉源,宫磊,王腾   

  1. 1.中国科学技术大学 大数据学院,合肥 230026
    2.中国科学技术大学 苏州高等研究院,江苏 苏州 215123
    3.中国科学技术大学 计算机科学与技术学院,合肥 230026
  • 出版日期:2024-09-15 发布日期:2024-09-13

Optimize Dataflow of DNN on Bit-Level Composable Architecture

GAO Hanyuan, GONG Lei, WANG Teng   

  1. 1.School of Data Science, University of Science and Technology of China, Hefei 230026, China
    2.Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu 215123, China
    3.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
  • Online:2024-09-15 Published:2024-09-13

摘要: 位级可组合架构用于支持有多种数据位宽类型的神经网络计算。其硬件结构有较多变体,面对不同神经网络模型需额外设计程序调度。过程耗时,阻碍软硬件的快速迭代和部署,效果难以评估。相关的数据流建模工作缺乏位级计算描述和自动化方法。提出了基于数据流建模的自适应位级可组合架构上的数据调度优化方法解决上述问题。引入位级数据流建模,以多种循环原语和张量-索引关系矩阵,描述位级可组合硬件结构的特征和应用的数据调度过程。从建模表达中提取数据访问信息,统计数据复用情况,进行快速评估。构建了设计空间探索框架,针对不同应用和硬件设计约束自适应优化数据调度过程。利用索引匹配方法和循环变换方法进行设计采样,添加贪心规则进行剪枝,以提高探索效率。在多个应用程序和多种硬件结构约束下进行实验。结果表明对比先进的手动设计的加速器和数据调度,获得了更好的性能表现。

关键词: 神经网络加速器, 可变位宽, 数据流, 设计空间探索

Abstract: Bit-level composable architecture is used to support neural networks with multiple data precision types. The hardware structures are variable. Besides, different applications require different data schedules. The design process is time-consuming and labor-intensive, hindering the rapid evolvement of software and hardware. The final effects are difficult to evaluate. Related works lack the bit-level consideration and automation. A schedule optimization method for bit-level composable architecture based on dataflow modeling is proposed to solve the problems. Dataflow modeling including different loop statements and a tensor-index matrix is introduced to describe the hardware structure and the scheduling process. Data access information and data reuse amount are quickly evaluated from dataflow representations. Based on the model, a design space exploration method is built to automatically design the schedule for different applications and hardware constraints. Pruning strategies are used to reduce design space and promote exploration efficiency. The experimental result shows that under different applications and hardware constraints, the method achieves better performance results compared to other accelerators and schedules.

Key words: deep neural network accelerator, precision scalable, dataflow, design space exploration