计算机工程与应用 ›› 2020, Vol. 56 ›› Issue (15): 37-42.DOI: 10.3778/j.issn.1002-8331.1907-0035

• 理论与研发 • 上一篇    下一篇

向量化友好的循环分块因子选择算法

柴晓菲,刘松,屈彬,王倩,伍卫国   

  1. 西安交通大学 电子与信息工程学部,西安 710049
  • 出版日期:2020-08-01 发布日期:2020-07-30

Vectorization-Friendly Tile Size Selection Algorithm

CHAI Xiaofei, LIU Song, QU Bin, WANG Qian, WU Weiguo   

  1. School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
  • Online:2020-08-01 Published:2020-07-30

摘要:

具有病态规模的嵌套循环程序在进行循环分块时容易忽略分块因子对向量化的影响,导致非对齐数据访问,降低分块后循环代码的性能。提出了一种向量化友好的循环分块因子选择算法VEC-TSS。该算法对可向量化循环层以向量化收益分析确定分块因子,对其他循环层通过以局部性收益和并行粒度确定分块因子。实验结果表明,针对具有病态规模的循环程序,VEC-TSS算法与另外两种分块因子选择算法相比可以获得更好的程序加速比,同时具有良好的可扩展性。

关键词: 向量化, 循环分块, 分块因子选择, cache优化

Abstract:

The effect of tile sizes on vectorization tends to be ignored when performing loop tiling on a nested loop with a pathological problem size, which results in unaligned data access and performance degradation of tiled loop codes. This paper proposes a VECtorization-friendly Tile Size Selection(VEC-TSS) algorithm to solve the problem. The algorithm calculates the tile size of vectorizable loop by maximum vectorization profit, and it determines the tile sizes of other loops based on locality analysis and parallel granularity. Experimental results show that VEC-TSS algorithm achieves a better speedup over other two tile size selection algorithms on pathological-size loop cases. The results also prove that the VEC-TSS algorithm has good scalability.

Key words: vectorization, loop tiling, tile size selection, cache optimization