信息网络安全 ›› 2023, Vol. 23 ›› Issue (5): 76-84.doi: 10.3969/j.issn.1671-1122.2023.05.008

• 技术研究 • 上一篇    下一篇

面向分组密码的高速可重构模运算单元设计

张晓磊, 戴紫彬(), 刘燕江, 曲彤洲   

  1. 解放军信息工程大学密码工程学院,郑州 450001
  • 收稿日期:2022-12-16 出版日期:2023-05-10 发布日期:2023-05-15
  • 通讯作者: 戴紫彬 E-mail:daizb@126.com
  • 作者简介:张晓磊(1992—),男,河北,硕士研究生,主要研究方向为安全专用芯片设计|戴紫彬(1966—),男,河南,教授,博士,主要研究方向为可重构计算与安全专用芯片|刘燕江(1990—),男,河南,讲师,博士,主要研究方向为芯片安全防护与硬件木马|曲彤洲(1994—),男,辽宁,博士研究生,主要研究方向为粗粒度可重构密码阵列设计
  • 基金资助:
    国家自然科学基金(61832018)

Design of High Speed Reconfigurable Modulo Arithmetic Unit for Block Cipher

ZHANG Xiaolei, DAI Zibin(), LIU Yanjiang, QU Tongzhou   

  1. Department of Cryptogram Engineering, PLA Information Engineering University, Zhengzhou 450001, China
  • Received:2022-12-16 Online:2023-05-10 Published:2023-05-15
  • Contact: DAI Zibin E-mail:daizb@126.com

摘要:

模运算单元是粗粒度可重构密码阵列(Coarse Grain Reconfigurable Cryptographic Array,CGRCA)的关键部件,通过重构不同处理位宽和模数的算术类密码算子来覆盖更多类型的分组密码,然而现有的模运算单元的执行延迟高且功能覆盖率低,限制了CGRCA整体性能的提升。文章通过分析分组密码模运算特性,提出一种可重构模运算方法,统一了该类算子的数学表达方式,并设计了一种可重构模运算单元 (Reconfigurable Modulo Arithmetic Unit,RMAU),该单元支持5种模乘运算、3种模加运算和3种乘法累加运算。同时,通过舍弃部分积中的无用比特位、扩展Wallace树压缩求和过程、精简模修正电路执行路径,降低了该单元的关键路径延迟。基于CMOS 180 nm工艺测试了RMAU的功能与性能,实验结果表明,文章所提的RMAU具备高功能覆盖率,与模乘RCE单元、可扩展模乘结构和RNS乘法器相比,计算延迟分别降低了39%、44%和47%。

关键词: 可重构计算, 模乘运算, 分组密码, 模修正运算

Abstract:

Modulo arithmetic unit is the key component of coarse grain reconfigurable cryptographic array (CGRCA). It can cover more types of block ciphers by reconfiguring arithmetic cryptographic operators with different processing width and modulus. However, the high execution latency and low functional coverage of the existing modulo arithmetic units limit the overall performance improvement of CGRCA. By analyzing the characteristics of modulo arithmetic in block ciphers, this paper proposed reconfigurable modular arithmetic unit (RMAU), which unified the mathematical expression of the operators and designed a RMAU. The unit supported five modular multiplication operations, three modular addition operations, and three multiply-accumulate operations. At the same time, the critical path delay of the unit was optimized by discarding useless bits in the partial product, extending the Wallace tree to compress the summing process, and shortening the modular correction module’s execution path. The function and performance of RMAU were tested in CMOS 180 nm process. The experimental results show that while RMAU has high functional coverage, compared with modular multiplier RCE unit, extensible modular multiplier structure and RNS multiplier, the computation delay is reduced by 39%, 44% and 47%, respectively.

Key words: reconfigurable computing, modular multiplication, block cipher, modulo correction operation

中图分类号: