计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (17): 74-88.DOI: 10.3778/j.issn.1002-8331.2311-0443

• 理论与研发 • 上一篇    下一篇

空间加速器的受约束数据流建模与评估框架

贺裕兴,王腾,滕文彬,宫磊   

  1. 1.中国科学技术大学 计算机科学与技术学院,合肥 230027
    2.中国科学技术大学 苏州高等研究院,江苏 苏州 215123
  • 出版日期:2024-09-01 发布日期:2024-08-30

Modeling and Evaluation Framework for Constrained Dataflow in Spatial Accelerators

HE Yuxing, WANG Teng, TENG Wenbin, GONG Lei   

  1. 1.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
    2.Suzhou Institution for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu 215123, China
  • Online:2024-09-01 Published:2024-08-30

摘要: 将张量计算任务部署在空间加速器上已被证明能有效提高其执行速度和效率。为了在空间加速器上高效地进行张量计算,学术界提出了一系列数据流建模与评估框架。这些框架能够快速评估数据流,以便进行高效的设计空间探索。然而,这些框架缺乏对硬件结构的细粒度描述,因此无法有效地建模硬件结构对数据流的约束,从而无法有效地探索受到真实加速器硬件结构限制的数据流设计空间。为了解决这一问题,对硬件结构进行了细粒度建模,采用多层次的空间加速器硬件结构作为模板。每一层都包括阵列结构、存储结构和互连网络结构三部分,以分别描述硬件结构对数据流在空间展开、存储容量和数据传输方式方面的限制。提出了一种计算任务和数据流建模方法,该方法能够有效地求解数据流对硬件资源的需求。基于此,提出了一个数据流评估框架,包括需求分析、约束分析和性能分析三部分。需求分析用于求解计算任务和数据流对硬件资源的需求;约束分析旨在检查数据流是否违反硬件结构约束;性能分析用于评估数据流的延迟、数据重用和资源利用率等性能指标。实验结果表明,与之前最先进的评估框架相比,提出的框架在延迟评估方面的误差有所降低,并且能够有效地支持对受限数据流设计空间的探索。

关键词: 张量计算, 空间加速器, 数据流, 建模与评估, 设计空间探索

Abstract: Deploying tensor computation tasks on spatial accelerators has been proven to effectively improve the execution speed and efficiency of tensor computations. To effectively deploy tensor computation on spatial accelerators, various dataflow modeling and evaluation frameworks have been proposed in academia. These frameworks enable quick evaluation of dataflows for efficient design space exploration. However, these frameworks lack fine-grained descriptions of the hardware structure, making it challenging to effectively model the constraints imposed by the hardware structure on the dataflow. As a result, they fail to explore the design space of dataflows constrained by real spatial accelerators effectively. To address this issue, this paper firstly provides a fine-grained modeling of the hardware architecture, using a multi-level spatial accelerator hardware structure as a template. Each level consists of three components: array structure, storage structure, and interconnect network structure, to respectively describe the constraints of the hardware architecture on spatial unfolding of data flow, storage capacity, and data transmission methods. Then, this paper proposes a tensor computation task and dataflow modeling approach that can solve the resource requirements of the dataflow. Based on this, the paper further proposes a dataflow evaluation framework, consisting of three parts:requirement analysis, constraint analysis, and performance analysis. The requirement analysis is used to determine the demands of computation tasks and dataflows on hardware resources. The constraint analysis aims to examine whether the dataflow violates hardware structure constraints. The performance analysis is used to evaluate performance metrics such as latency, data reuse, and resource utilization of the dataflow. Experimental results demonstrate that compared to the state-of-the-art evaluation framework, the proposed framework reduces the error in latency evaluation, and effectively supports the exploration of constrained dataflow design space.

Key words: tensor computation, spatial accelerator, dataflow, modeling and evaluation, design space exploration