信息网络安全 ›› 2024, Vol. 24 ›› Issue (6): 959-967.doi: 10.3969/j.issn.1671-1122.2024.06.013

• 密码专题 • 上一篇    下一篇

一种面积高效的双态可配置NTT硬件加速器

朱敏, 肖昊()   

  1. 合肥工业大学微电子学院,合肥 230601
  • 收稿日期:2024-03-07 出版日期:2024-06-10 发布日期:2024-07-05
  • 通讯作者: 肖昊 xiaohao@hfut.edu.cn
  • 作者简介:朱敏(1998—),女,湖北,硕士研究生,主要研究方向为信息安全与硬件加速|肖昊(1982—),男,安徽,教授,博士,主要研究方向为可信计算芯片、专用硬件加速器、多核片上系统设计
  • 基金资助:
    国家自然科学基金(61974039)

An Area Efficient Dual-State Configurable NTT Hardware Accelerator

ZHU Min, XIAO Hao()   

  1. School of Microelectronics, Hefei University of Technology, Hefei 230601, China
  • Received:2024-03-07 Online:2024-06-10 Published:2024-07-05

摘要:

矩阵向量乘法是基于格的后量子密码(Post-Quantum Cryptography,PQC)方案的主要计算瓶颈。利用数论变换(Number Theoretic Transform,NTT)能将矩阵向量乘法的计算复杂度从O(N2)降到O(Nlog2N),从而可以进一步提高后量子密码方案的计算速度。文章基于现场可编程门阵列(Field Programmable Gate Array, FPGA)提出了一种面积高效的双态可配置NTT硬件加速器,能高效地执行Kyber和Dilithium算法中的NTT运算。文章所提方案使用的模乘器通过查找表(Look Up Table,LUT)技术压缩数据位宽降低取模成本后,利用KRED算法对结果约简。此外,结合优化后的无冲突NTT数据流,文章所提出的双态可配置NTT加速器可以高效完成计算。文章所提出的NTT硬件加速器在Xilinx Artix-7平台上进行了验证。相较于参考文献方案,文章所提出的双态可配置NTT硬件加速器在保持对Kyber和Dilithium算法通用性的同时,在计算性能和硬件开销等方面表现更好。

关键词: 后量子密码, 快速数论变换, 模乘, 硬件加速, 现场可编程门阵列

Abstract:

Matrix-vector multiplication is the main computational bottleneck of lattice-based Post-Quantum Cryptography (PQC) schemes. Utilizing the number theoretic transform (NTT) can reduce the computational complexity of matrix-vector multiplication from O(N2) to O(Nlog2N), thereby further improving the computational speed of post-quantum cryptographic schemes. This article proposed an area-efficient dual-mode configurable NTT hardware accelerator based on field programmable gate array (FPGA), capable of efficiently executing NTT operations in the Kyber and Dilithium algorithms. The multiplier used in the proposed design compresses data bit width and reduced modulo costs using table lookup techniques, followed by reduction of results using the KRED algorithm. Furthermore, by combining optimized non-conflicting NTT data streams, the proposed dual-mode configurable NTT accelerator can efficiently complete computations. The NTT hardware accelerator proposed in this article is validated on the Xilinx Artix-7 platform. Compared to the reference work, the proposed dual-mode configurable NTT hardware accelerator performs better in terms of computational performance and hardware overhead while maintaining generality for Kyber and Dilithium algorithms.

Key words: post-quantum cryptography, number theoretic transform, modular multiplication, hardware acceleration, field programmable gate array

中图分类号: