计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (1): 180-186.DOI: 10.3778/j.issn.1002-8331.2106-0502

• 模式识别与人工智能 • 上一篇    下一篇

基于TVM平台的MEC卷积算法优化

王朝闻,蒋林,李远成,朱筠   

  1. 1.西安科技大学 计算机科学与技术学院,西安 710600
    2.西安邮电大学 电子工程学院,西安 710121
  • 出版日期:2023-01-01 发布日期:2023-01-01

Optimization of MEC Convolution Algorithm Based on TVM Platform

WANG Zhaowen, JIANG Lin, LI Yuancheng, ZHU Yun   

  1. 1.School of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an 710600, China
    2.School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710600, China
  • Online:2023-01-01 Published:2023-01-01

摘要: 针对MEC(memory efficient convolution)卷积算法在传统设备下因访问数据地址不连续导致的缓存命中率低、内存访问延时长等问题,提出一种适用于MEC算法访存行为的优化方法。该方法分为中间矩阵转换和矩阵运算两部分。对于中间矩阵转换部分,采用修改数据读取顺序的方式对其进行优化,使读取方式符合算法的访存行为。对于矩阵运算部分,采用更加适合矩阵运算的内存数据布局对卷积核矩阵修改,并利用TVM(tensor virtual machine)平台封装的计算函数,重新设计中间矩阵同卷积核矩阵的计算方式。使用平台自带并行库对运算过程进行加速。实验结果表明,相比传统MEC算法,提出的优化方法可以有效解决缓存命中率低、内存访问延时长等问题,同MEC算法的运算时间对比,在单个卷积层上平均获得了50%的速度提升,在多层神经网络中最低获得了57%以上的速度提升,同空间组合算法的运算时间对比,最高获得了80%的速度提升。

关键词: 卷积计算, 访存行为, 缓存技术, MEC算法

Abstract: In order to solve the problems of cache hit rate and memory access delay in MEC(memory efficient convolution) algorithm on traditional devices, which are caused by discontinuous data access addresses, this paper proposes an optimization method for the memory access mode of MEC algorithm. The optimization algorithm is divided into intermediate matrix transformation and matrix operation. Firstly, for the intermediate matrix transformation part, it is optimized by modifying the data reading order, which makes the reading method conform to the memory access mode of the algorithm. Secondly, for the convolutional kernel matrix part, this paper adopts the memory data layout that is suitable for matrix operation. And for the matrix operation part, the calculation function encapsulated by the TVM(Tensor Virtual Machine) platform is used to redesign the calculation method, which between the intermediate matrix and the convolutional kernel matrix. Finally, the platform’s parallel library is used to speed up the computing process. Experimental results show that, compared with the traditional MEC algorithm, the optimization method proposed can effectively improve the cache hit ratio and reduce the memory access latency. The average time performance improvement is 50% on a single convolutional layer, and the minimum speed improvement is more than 57% in a neural network, compared with the spatial pack algorithm, the speed is increased by up to 80%.

Key words: convolution calculation, memory access mode, cache technology, memory efficient convolution(MEC) algorithm