计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (3): 74-79.DOI: 10.3778/j.issn.1002-8331.1810-0413

• 大数据与云计算 • 上一篇    下一篇

广义共轭余差法的通信避免算法

金之雁,杨磊,林隽民,王哲   

  1. 1.中国气象科学研究院,北京 100081
    2.英特尔中国有限公司,北京 100081
  • 出版日期:2020-02-01 发布日期:2020-01-20

Communication Avoiding Algorithm of Generalized Conjugate Residual Method

JIN Zhiyan, YANG Lei, LIN Junmin, WANG Zhe   

  1. 1.Chinese Academy of Meteorological Sciences, Beijing 100081, China
    2.Intel China Ltd, Beijing 100081, China
  • Online:2020-02-01 Published:2020-01-20

摘要: 广义共轭余差法是一种用于求解非对称线性方程组的有效算法。为减少算法中的全局通信,首创性地提出了“通信避免的广义共轭余差法”,避免了迭代过程中的全局通信,使算法中的全局通信总次数降低了一个数量级,同时减少了约50%的计算量(计算量的具体减少比例与计算规模相关)。大规模测试中(最大16?384进程),新算法最高达到了原算法3倍的运算速率。进一步分析表明,新算法在各种并行规模下的运算速率和可扩展性都优于原算法。在较小并行规模下,新算法的优势主要来源于计算量的减少。在较大并行规模下,新算法的优势主要来源于全局通信量的减少。

关键词: 通信避免算法, 广义共轭余差法, 并行计算, 全球区域一体化数值预报模式, 曙光-派计算集群

Abstract: Generalized Conjugate Residual(GCR) method is an effective algorithm to solve asymmetrical linear systems. To reduce global communication of the algorithm, this paper puts forward the “communication avoiding generalized conjugate residual method”, avoids the global communication in the iterative process, global communication in the algorithm is reduced by an order of magnitude, at the same time reduces about 50% of the computation(the reduction of computation and communication related to scale).In the massively experiment(the maximum process is 16 384), the new algorithm is up to three times as fast as the original algorithm. Further analysis shows that:the new algorithm is faster and more scalable than the original algorithm in various parallel scales. In the case of small parallel scale, the advantages of the new algorithm mainly come from the reduction of computing. In the case of large parallel scale, the advantage of the new algorithm mainly comes from the reduction of global communication.

Key words: communication avoiding algorithm, generalized conjugate residual method, parallel computing, global/regional assimilation and prediction system, Sugon_&pi, cluster