计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (19): 9-11.

• 博士论坛 • 上一篇    下一篇

CPU与GPU上几种矩阵乘法的比较与分析

刘进锋,郭 雷   

  1. 西北工业大学 自动化学院,西安 710129
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-07-01 发布日期:2011-07-01

Comparison and analysis of matrix multiplications on GPU and CPU

LIU Jinfeng,GUO Lei   

  1. School of Automation,Northwestern Polytechnical University,Xi’an 710129,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-07-01 Published:2011-07-01

摘要: 描述了矩阵乘法在CPU上的三种实现方法和在GPU上基于CUDA架构的四种实现方法,分析了高性能方法的原由,发现它们的共同特点都是合理地组织数据并加以利用,这样能有效地减少存取开销,极大地提高算法的速度。其中CPU上的最优实现方法比普通算法快了200多倍,GPU上的最优实现方法又比CPU上的最优实现方法快了约6倍。

关键词: 矩阵乘法, 统一计算设备架构, 图形处理器, 存储模式

Abstract: Three matrix multiplications on CPU and four CUDA-based matrix multiplications on GPU are described,the causes of high performance are analyzed and the common characteristic of efficient algorithm is that data are properly organized and rationally utilized,and therefore the access cost effectively reduced and the speed is greatly improved.The best optimized implementation on CPU gain more 200 times fast than the common one,the best optimized implementation on GPU gain about 6 times fast than the best one on CPU.

Key words: matrix multiplication, Compute Unified Device Architecture(CUDA), Graphic Processing Unit(GPU), storage pattern