信息网络安全 ›› 2023, Vol. 23 ›› Issue (4): 72-79.doi: 10.3969/j.issn.1671-1122.2023.04.008

• 技术研究 • 上一篇    下一篇

抗量子密码中快速数论变换的硬件设计与实现

肖昊(), 赵延睿, 胡越, 刘笑帆   

  1. 合肥工业大学微电子学院,合肥 230601
  • 收稿日期:2022-12-16 出版日期:2023-04-10 发布日期:2023-04-18
  • 通讯作者: 肖昊 E-mail:xiaohao@hfut.edu.cn
  • 作者简介:肖昊(1982—),男,安徽,教授,博士,主要研究方向为可信计算芯片、专用硬件加速器和多核片上系统设计|赵延睿(1998—),男,山东,硕士研究生,主要研究方向为信息安全与硬件加速|胡越(1998—),男,安徽,硕士研究生,主要研究方向为椭圆曲线密码学|刘笑帆(1997—),女,河北,硕士研究生,主要研究方向为可信计算。
  • 基金资助:
    国家自然科学基金(61974039)

Hardware Design and Implementation of Number Theoretic Transform in Post-Quantum Cryptography

XIAO Hao(), ZHAO Yanrui, HU Yue, LIU Xiaofan   

  1. School of Microelectronics, Hefei University of Technology, Hefei 230601, China
  • Received:2022-12-16 Online:2023-04-10 Published:2023-04-18
  • Contact: XIAO Hao E-mail:xiaohao@hfut.edu.cn

摘要:

快速数论变换(Number Theoretic Transform,NTT)是抗量子密码算法的关键部分,其计算性能对系统的运行速度至关重要。相比经典的NTT算法,高基NTT算法可以达到更好的计算性能。针对高基NTT硬件实现过程中计算流程冗长、控制逻辑复杂的问题,文章基于流水线结构提出一种高性能的基-4 NTT硬件架构。首先,基于经典NTT算法,推导出利于硬件实现的基-4递归NTT,简化了高基算法的计算流程;然后,提出一种单路延迟反馈结构,对计算流程进行有效的流水线分割,降低了硬件架构的复杂度;最后,利用两级蝶形运算耦合实现基-4蝶形单元,并使用移位与加法优化约简计算过程,节省了硬件资源开销。文章以抗量子密码方案Falcon为例,在Xilinx Artix-7 FPGA上实现了所提出的NTT硬件架构。实验结果表明,与其他相关的设计相比,文章提出的设计方案在计算性能和硬件开销等方面表现更好。

关键词: 抗量子密码, 快速数论变换, 硬件加速, 现场可编程门阵列

Abstract:

Number theoretic transform (NTT) is a key component of post-quantum cryptography algorithms, and its computing performance is critical to the running speed of the system. Compared with the classical NTT algorithm, the high-radix NTT algorithm can achieve better computational performance. In order to solve the problems of lengthy computing flow and complex control logic in the hardware implementation of high-radix NTT, this paper proposed a high-performance radix-4 NTT hardware architecture based on pipeline structure. Firstly, based on the classical NTT algorithm, a radix-4 recursive NTT was derived to facilitate hardware implementation, which simplified the computing flow of the high-radix algorithm. Secondly, a single-path delay feedback structure was presented to effectively pipeline the algorithm flow and reduced the complexity of the hardware architecture. Finally, the radix-4 butterfly unit was realized by coupling two-stage butterfly operations, and the reduction was optimized by using shift operations and additions, which could reduce the overhead of hardware resources. Taking the post-quantum cryptography algorithm falcon as an example, the proposed NTT hardware architecture has been implemented on Xilinx Artix-7 FPGA. The experimental results show that the proposed design has good performance in computing speed and hardware resources overhead compared to the related designs.

Key words: post-quantum cryptography, number theoretic transform, hardware acceleration, field programmable gate array

中图分类号: