Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (19): 9-11.

• 博士论坛 • Previous Articles     Next Articles

Comparison and analysis of matrix multiplications on GPU and CPU

LIU Jinfeng,GUO Lei   

  1. School of Automation,Northwestern Polytechnical University,Xi’an 710129,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-07-01 Published:2011-07-01


刘进锋,郭 雷   

  1. 西北工业大学 自动化学院,西安 710129

Abstract: Three matrix multiplications on CPU and four CUDA-based matrix multiplications on GPU are described,the causes of high performance are analyzed and the common characteristic of efficient algorithm is that data are properly organized and rationally utilized,and therefore the access cost effectively reduced and the speed is greatly improved.The best optimized implementation on CPU gain more 200 times fast than the common one,the best optimized implementation on GPU gain about 6 times fast than the best one on CPU.

Key words: matrix multiplication, Compute Unified Device Architecture(CUDA), Graphic Processing Unit(GPU), storage pattern

摘要: 描述了矩阵乘法在CPU上的三种实现方法和在GPU上基于CUDA架构的四种实现方法,分析了高性能方法的原由,发现它们的共同特点都是合理地组织数据并加以利用,这样能有效地减少存取开销,极大地提高算法的速度。其中CPU上的最优实现方法比普通算法快了200多倍,GPU上的最优实现方法又比CPU上的最优实现方法快了约6倍。

关键词: 矩阵乘法, 统一计算设备架构, 图形处理器, 存储模式