Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (27): 66-71.

• 学术探讨 • Previous Articles     Next Articles

Improvement and performance analysis of MPI_ALLGATHER algorithm

LI Zhan-sheng1,BI Hui-juan1,DU Zhi-hui2,JIAO Qing3   

  1. 1.Department of Software Platform,North China Institute of Computing Technology,Beijing 100083,China
    2.Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
    3.Library of Ludong University,Yantai,Shandong 264025,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-21 Published:2007-09-21
  • Contact: LI Zhan-sheng

MPI_ALLGATHER实现算法的改进与性能分析

李占胜1,毕会娟1,都志辉2,焦 青3   

  1. 1.华北计算技术研究所 软件平台研究室,北京 100083
    2.清华大学 计算机科学与技术系,北京 100084
    3.鲁东大学 图书馆,山东 烟台 264025
  • 通讯作者: 李占胜

Abstract: We discuss issues related to the high performance implementation of collective communication operations in MPICH for clusters in this paper.The Ring and Neighbor Exchange algorithms have the best communication locality property,and they have more communication steps than others.The Recursive Doubling and Bruck algorithms have less communication steps,and they have worse communication locality property than others.Based on the discussion,we present a new MPI_ALLGATHER algorithm that combines the Neighbor Exchange and Recursive Doubling algorithms.The new algorithm has less data transfer times than the Neighbor Exchange algorithm and better communication locality property than the Recursive Doubling algorithm.We evaluate the new algorithm in Myrinet cluster and find that the new algorithm performs better than the Neighbor Exchange algorithm and sometimes performs the best for medium-size messages among all the algorithms.The new algorithm also performs better for long-size messages than the Recursive Doubling and Bruck algorithms.

Key words: parallel programming, Message Passing Interface(MPI), collective communication, MPI_ALLGATHER

摘要: 首先分析了影响MPI组通信性能的各方面因素,提出了一种衡量算法性能的模型。基于这种分析及模型,提出了一种将邻居交换和递归倍增两种算法结合的新的MPI_ALLGATHER实现算法。新的算法比邻居交换算法通信次数少,比递归倍增算法具有较好的通信局部性。通过在高性能机群系统中的测试,发现新算法在多种情况下比邻居交换算法具有更优的性能,在中等长度消息通信时具有最优的性能,在长消息通信时性能比递归倍增算法和Bruck算法的性能更优,且在长消息通信时多数情况下性能最优。

关键词: 并行编程, MPI, 组通信, MPI_ALLGATHER算法