计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (2): 363-371.DOI: 10.3778/j.issn.1002-8331.2308-0434

• 工程与应用 • 上一篇    下一篇

面向杂凑密码算法的专用指令加速器的设计与实现

王轩,刘勤让,陈磊,魏帅,范旺,杨恒   

  1. 1.郑州大学 网络空间安全学院,郑州 450002
    2.战略支援部队信息工程大学 信息技术研究所,郑州 450002
    3.井芯微电子技术(天津)有限公司,天津 300457
  • 出版日期:2025-01-15 发布日期:2025-01-15

Design and Implementation of Specialized Instruction Accelerator for Hash Cryptography Algorithms

WANG Xuan, LIU Qinrang, CHEN Lei, WEI Shuai, FAN Wang, YANG Heng   

  1. 1.School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China
    2.Institute of Information Technology, PLA Information Engineering University, Zhengzhou 450002, China
    3.Well Core Microelectronics Technology (Tianjin) Company Limited, Tianjin 300457, China
  • Online:2025-01-15 Published:2025-01-15

摘要: 物联网的快速发展对嵌入式设备的系统性能和数据安全性的要求越来越高,传统的通用嵌入式处理器对密码算法的实现效率不高,不能很好满足性能需要,此外嵌入式设备还有着低功耗的场景需求。为解决以上问题,在Xilinx ZYNQ ZC706嵌入式开发平台上设计了一个低功耗的面向杂凑密码算法的专用指令加速器,该加速器包含有取指译码单元、执行单元和数据访存单元,通过多任务数据并行和专用指令实现计算加速;并设计令牌机制解决指令执行时的数据冲突问题;在高层次综合(high-level synthesis,HLS)工具的基础上通过存储优化改进访存机制,有效提高带宽利用率。实验结果表明,加速器的工作频率为100 MHz,该ARM+FPGA方案相较于单ARM方案可达3倍以上的加速效果,而且运行功耗仅为2.23 W,该加速器也可定制化拓展,有较好的灵活性。

关键词: 嵌入式应用, 加速器设计, 专用指令, 高层次综合, 数据并行

Abstract: The rapid development of the Internet of Things has increasingly high requirements for the system performance and data security of embedded devices. Traditional general-purpose embedded processors have low efficiency in implementing cryptographic algorithms and cannot meet performance requirements well. In addition, embedded devices are also in low power consumption scenarios. To address the above issues, a low-power specialized instruction accelerator for hash cipher algorithms is designed on the Xilinx ZYNQ ZC706 embedded development platform. The accelerator includes a fetch decoding unit, an execution unit, and data access units, which achieves computation acceleration through multitasking data parallelism and specialized instructions. And a token mechanism is designed to solve the problem of data conflicts during instruction execution. On the basis of high-level synthesis (HLS) tools, the paper uses storage optimization to improve the access mechanism and effectively improve bandwidth utilization. The experimental results show that the working frequency of the accelerator is 100 MHz, and the ARM+FPGA scheme can achieve more than three times the acceleration effect compared to the single ARM scheme. The operating power consumption is only 2.23 W, and the accelerator can also be customized and expanded, with good flexibility.

Key words: embedded applications, accelerator design, specialized instructions, high-level synthesis, data parallelism