基于TVM平台的MEC卷积算法优化

doi:10.3778/j.issn.1002-8331.2106-0502

摘要/Abstract

摘要： 针对MEC（memory efficient convolution）卷积算法在传统设备下因访问数据地址不连续导致的缓存命中率低、内存访问延时长等问题，提出一种适用于MEC算法访存行为的优化方法。该方法分为中间矩阵转换和矩阵运算两部分。对于中间矩阵转换部分，采用修改数据读取顺序的方式对其进行优化，使读取方式符合算法的访存行为。对于矩阵运算部分，采用更加适合矩阵运算的内存数据布局对卷积核矩阵修改，并利用TVM（tensor virtual machine）平台封装的计算函数，重新设计中间矩阵同卷积核矩阵的计算方式。使用平台自带并行库对运算过程进行加速。实验结果表明，相比传统MEC算法，提出的优化方法可以有效解决缓存命中率低、内存访问延时长等问题，同MEC算法的运算时间对比，在单个卷积层上平均获得了50%的速度提升，在多层神经网络中最低获得了57%以上的速度提升，同空间组合算法的运算时间对比，最高获得了80%的速度提升。

关键词: 卷积计算, 访存行为, 缓存技术, MEC算法

Abstract: In order to solve the problems of cache hit rate and memory access delay in MEC（memory efficient convolution） algorithm on traditional devices, which are caused by discontinuous data access addresses, this paper proposes an optimization method for the memory access mode of MEC algorithm. The optimization algorithm is divided into intermediate matrix transformation and matrix operation. Firstly, for the intermediate matrix transformation part, it is optimized by modifying the data reading order, which makes the reading method conform to the memory access mode of the algorithm. Secondly, for the convolutional kernel matrix part, this paper adopts the memory data layout that is suitable for matrix operation. And for the matrix operation part, the calculation function encapsulated by the TVM（Tensor Virtual Machine） platform is used to redesign the calculation method, which between the intermediate matrix and the convolutional kernel matrix. Finally, the platform’s parallel library is used to speed up the computing process. Experimental results show that, compared with the traditional MEC algorithm, the optimization method proposed can effectively improve the cache hit ratio and reduce the memory access latency. The average time performance improvement is 50% on a single convolutional layer, and the minimum speed improvement is more than 57% in a neural network, compared with the spatial pack algorithm, the speed is increased by up to 80%.

Key words: convolution calculation, memory access mode, cache technology, memory efficient convolution（MEC） algorithm

王朝闻, 蒋林, 李远成, 朱筠. 基于TVM平台的MEC卷积算法优化[J]. 计算机工程与应用, 2023, 59(1): 180-186.

WANG Zhaowen, JIANG Lin, LI Yuancheng, ZHU Yun. Optimization of MEC Convolution Algorithm Based on TVM Platform[J]. Computer Engineering and Applications, 2023, 59(1): 180-186.

参考文献

[1] KHAN A，SOHAIL A，ZAHOORA U，et al.A survey of the recent architectures of deep convolutional neural networks[J].Artificial Intelligence Review，2020，53（8）：5455-5516.
[2] GUO R，ZHOU Y，ZHAO J，et al.Point cloud classification by dynamic graph CNN with adaptive feature fusion[J].IET Computer Vision，2021，15（3）：235-244.
[3] TU S，UR REHMAN S，WAQAS M，et al.Optimisation-based training of evolutionary convolution neural network for visual classification applications[J].IET Computer Vision，2020，14（5）：259-267.
[4] YANG J，ZOU B，QIU H，et al.MLFNet-point cloud semantic segmentation convolution network based on multi-scale feature fusion[J].IEEE Access，2021，9：44950-44962.
[5] SHARMA S，BAWA V S，KUMAR V.A novel two-stage residual learning based convolutional neural network for image super resolution[J].Fundamenta Informaticae，2019，168（2）：335-351.
[6] JIANG Y，ZHAO T，HE X，et al.BitStream：An efficient framework for inference of binary neural networks on CPUs[J].Pattern Recognition Letters，2019，125：303-309.
[7] CHO M，BRAND D.MEC：Memory-efficient convolution for deep neural network[C]//Proceedings of the International Conference on Machine Learning，2017：815-824.
[8] 方玉玲，陈庆奎.基于矩阵转换的卷积计算优化方法[J].计算机工程，2019，45（7）：217-221.
FANG Y L，CHEN Q K.Optimization method of convolution computation based on matrix transformation[J].Computer Engineering，2019，45（7）：217-221.
[9] 章铁飞，陈天洲，吴剑钟.基于程序访存模式的低功耗存储技术[J].软件学报，2014，25（2）：254-266.
ZHANG T F，CHEN T Z，WU J Z.Low power storage technology based on program access mode[J].Journal of Software，2014，25（2）：254-266.
[10] RAJABATHER K，ABIMANNAN S，et al.Restoration of digital design using row and column major parsing a technique[C]//Proceedings of the 2020 IEEE International Conference on Electronics，Computing and Communication Technologies（CONECCT），2020：1-6.
[11] CHEN T，MOREAU T，JIANG Z，et al.TVM：An automated end-to-end optimizing compiler for deep learning[C]//Proceedings of the 13th Symposium on Operating Systems Design and Implementation，2018：578-594.
[12] LI M，LIU Y，LIU X，et al.The deep learning compiler：A comprehensive survey[J].IEEE Transactions on Parallel and Distributed Systems，2020，32（3）：708-727.
[13] FAZ-HERNáNDEZ A，LOPEZ J，DAHAB R.High-performance implementation of elliptic curve cryptography using vector instructions[J].ACM Transactions on Mathematical Software（TOMS），2019，45（3）：1-35.
[14] GIBSON P，CANO J，TURNER J，et al.Optimizing grouped convolutions on edge devices[C]//Proceedings of the IEEE 31st International Conference on Application-specific Systems，Architectures and Processors（ASAP），2020：189-196.
[15] HUANG G，LIU Z，VAN L，et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2017：4700-4708.
[16] LU J，FANG C，XU M，et al.Evaluations on deep neural networks training using posit number system[J].IEEE Transactions on Computers，2020，70（2）：174-187.
[17] QIU X，SUN T，XU Y，et al.Pre-trained models for natural language processing：A survey[J].Science China Technological Sciences，2020，63（10）：1872-1897.