信息网络安全 ›› 2025, Vol. 25 ›› Issue (7): 1021-1031.doi: 10.3969/j.issn.1671-1122.2025.07.002

• 理论研究 • 上一篇    下一篇

基于FPGA的SM4异构加速系统

张全新1, 李可1, 邵雨洁1, 谭毓安2()   

  1. 1.北京理工大学计算机学院,北京 100081
    2.北京理工大学网络空间安全学院,北京 100081
  • 收稿日期:2023-07-13 出版日期:2025-07-10 发布日期:2025-08-07
  • 通讯作者: 谭毓安 E-mail:tan2008@bit.edu.cn
  • 作者简介:张全新(1974—),男,北京,副教授,博士,主要研究方向为深度学习及其对抗技术、计算机视觉安全、信息安全|李可(1999—),男,北京,硕士研究生,主要研究方向为FPGA加速|邵雨洁(1999—),女,北京,硕士研究生,主要研究方向为FPGA加速|谭毓安(1972—),男,北京,教授,博士,CCF会员,主要研究方向为Android安全、深度学习及对抗、物联网与嵌入式系统、数据存储安全
  • 基金资助:
    国家自然科学基金(U2336201);国家自然科学基金(U1936218)

An FPGA-Based Heterogeneous Acceleration System for SM4 Algorithm

ZHANG Quanxin1, LI Ke1, SHAO Yujie1, TAN Yu’an2()   

  1. 1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    2. School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing 100081, China
  • Received:2023-07-13 Online:2025-07-10 Published:2025-08-07
  • Contact: TAN Yu’an E-mail:tan2008@bit.edu.cn

摘要:

国密SM4算法是WAPI无线网络标准中广泛使用的加密算法。目前,针对SM4加解密的研究主要集中于硬件实现结构优化,以提高吞吐量和安全性。同时,大数据和5G通信技术的发展对数据加解密的带宽和实时性提出了更高的要求。基于此背景,文章提出一种基于FPGA的SM4异构加速系统,使用硬件实现SM4算法,并优化加解密性能;采用流式高速数据传输架构,支持多个SM4核并行工作,充分利用系统带宽;设计可配置接口,连接SM4与传输架构,提供足够的灵活性。系统于Xilinx XCVU9P FPGA上实现,支持随时更改SM4的负载和模式。测试得到SM4的最大工作频率为462 MHz,系统吞吐量高达92 Gbit/s,延迟仅为266 μs。实验结果表明,与其他现有工作相比,该系统能获得更高的SM4工作频率和系统吞吐量,满足高带宽和低延迟的SM4加速需求。

关键词: 国密SM4算法, FPGA, 硬件加速, 传输架构

Abstract:

The national cryptographic SM4 algorithm is widely used in the WAPI wireless network standard. Currently, the SM4 encryption-decryption research mainly focuses on the optimization of the hardware implementation structure to improve throughput and security. Meanwhile, the development of big data and 5G communication technology has raised higher requirements for the bandwidth and real-time performance of data encryption. Based on the background, this paper proposed an FPGA-based heterogeneous acceleration system for SM4 algorithm, which used hardware to implement the SM4 algorithm and optimize encryption performance. The system adopted a streaming high-speed data transmission architecture, supported multiple SM4 cores to work in parallel, and fully utilized the computer bandwidth. The system was designed with configurable interfaces to connect SM4 with the transmission architecture and provided sufficient flexibility. The system was implemented on Xilinx XCVU9P FPGA and supported changing the load and mode of SM4 anytime. Through experiments, the maximum operating frequency of SM4 is 462MHz, the system throughput is as high as 92Gbit/s, and the delay is only 266μs. The results show that compared with other existing works, this system can achieve higher SM4 operating frequency and system throughput, which meets the high bandwidth and low latency requirements of SM4 acceleration.

Key words: SM4 algorithm, FPGA, hardware acceleration, transmission architecture

中图分类号: