计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (2): 47-52.DOI: 10.3778/j.issn.1002-8331.1503-0333

• 理论与研发 • 上一篇    下一篇

CPU-GPU融合架构上的缓存性能分析与优化

孙传伟,安  虹,孙  荪,陈俊仕   

  1. 中国科学技术大学 计算机科学与技术学院,合肥 230027
  • 出版日期:2017-01-15 发布日期:2017-05-11

Performance evaluation and optimization of cache on fused CPU-GPU architecture

SUN Chuanwei, AN Hong, SUN Sun, CHEN Junshi   

  1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
  • Online:2017-01-15 Published:2017-05-11

摘要: 现今CPU和GPU的发展已经出现新的瓶颈,将两者“结合”在同一块芯片上成为一种新的趋势。这种新的异构架构给片上共享资源的管理带来压力。而共享末级缓存(LLC)的管理对性能的影响非常关键。由于CPU程序和GPU程序的不同特性,给CPU和GPU间共享的末级缓存管理带来新的挑战。通过分析GPU程序访存特征,借鉴之前的缓存管理方案,提出对CPU-GPU融合系统的末级缓存进行等量的静态划分和最优静态划分的方案。实验结果表明:通过缓存划分可以有效避免CPU和GPU程序间的干扰。与传统LRU策略相比,等量静态划分和最优静态划分可以使系统整体性能分别提高7.68%和11.62%。

关键词: 异构架构, 融合, 共享末级缓存, 静态缓存划分

Abstract: Nowadays the development of the CPU and GPU has met a new bottleneck. “Combination” of the CPUs and GPUs on the same chip has become a new popular architectural trend. These new heterogeneous architectures put more pressure on shared resource management. Particularly, the management of Last-Level Cache(LLC) is very important to performance. Due to the different characteristics of the CPU and GPU applications, managing the shared LLC between CPUs and GPUs brings new challenges. In this paper, the GPU applications’ features are analyzed. And the half-to-half and optimal cache partition on the fused architecture are proposed by absorbing previous cache management schemes. Experimental results indicate that static cache partition can effectively avoid the interference between CPU and GPU applications. Compared to LRU, half-to-half and optimal cache partition improves performance by 7.68% and 11.68% respectively.

Key words: heterogeneous architecture, fusion, shared last-level cache, static cache partition