计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (34): 81-85.

• 网络、通信、安全 • 上一篇    下一篇

NTRU加解密算法的GPU实现研究

朱 瑶,严承华,李 强   

  1. 海军工程大学 电子工程学院,武汉 430033
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-12-01 发布日期:2011-12-01

Research of NTRU encryption and decryption algorithm with GPU

ZHU Yao,YAN Chenghua,LI Qiang   

  1. School of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-12-01 Published:2011-12-01

摘要: GPU拥有高度并行性和可编码的特点,在大规模数据并行计算方面得到广泛应用。NTRU算法是一种安全性高,易于并行化的公钥密码算法。研究了NTRU算法基于CUDA的并行化实现技术,将计算中最耗时的卷积运算分解到多个线程并行计算,引入大量的独立并发的加解密线程块来完成整个加解密过程,并给出了具体的数据编码及存储结构、线程组织以及基于合并访问和共享内存的性能优化技术。实验结果表明,基于CUDA的NTRU加解密算法实现了硬件加速,相对于NTRU算法在CPU的实现,CUDA实现能够达到12.38 MB/s的吞吐量,可获得最大为95倍的加速比。

关键词: 统一计算设备架构, 图形处理单元, NTRU算法, 并行

Abstract: Graphic Processing Unit(GPU) has a advantage of high parallelism and programmable,which is applied widely to massive data parallel compute.NTRU is a public key cryptography algorithm which has a high security and is easy to be parallel.A high performance implementation of NTRU algorithm based on Compute Unified Device Architecture(CUDA) is presented.The most time-consuming convolution is divided into several parallel threads to compute and the whole CUDA implementation of NTRU is large amount of independent parallel thread blocks of encryption or decryption in the kernel side.The thread organization scheme and data encode and storage are also presented.Besides,coalesced access and shared memory based performance improvement method are also presented.The result shows that the implementation of NTRU based on CUDA is with high efficiency compared with the tradition NTRU algorithm implemented on CPU can get throughput of 12.38 MB/s and acceleration of 95 times at most.

Key words: Compute Unified Device Architecture(CUDA), Graphic Processing Unit(GPU), NTRU algorithm, parallelization