信息网络安全 ›› 2020, Vol. 20 ›› Issue (6): 36-43.doi: 10.3969/j.issn.1671-1122.2020.06.005
李秀滢1,2, 吉晨昊1,2, 段晓毅1(), 周长春1,2
收稿日期:
2020-04-11
出版日期:
2020-06-10
发布日期:
2020-10-21
通讯作者:
段晓毅
E-mail:duanxiaoyi@besti.edu.cn
作者简介:
李秀滢(1975—),女,河北,副教授,硕士,主要研究方向为密码与信息安全、并行计算、端侧人工智能|吉晨昊(1996—),男,河北,硕士研究生,主要研究方向为密码与信息安全、并行计算|段晓毅(1979—),男,贵州,讲师,博士,主要研究方向为密码与信息安全、电子工程|周长春(1963—),男,吉林,正高级工程师,本科,主要研究方向为密码学、通信工程
基金资助:
LI Xiuying1,2, JI Chenhao1,2, DUAN Xiaoyi1(), ZHOU Changchun1,2
Received:
2020-04-11
Online:
2020-06-10
Published:
2020-10-21
Contact:
DUAN Xiaoyi
E-mail:duanxiaoyi@besti.edu.cn
摘要:
密码算法的运算速度与算力成正比,一些学者通过提高CPU速度、使用硬件加密卡等方案提高密码算法运算速度。随着图形处理器(GPU)在高性能并行计算领域的广泛应用,国内外学者已经展开了基于GPU加速密码运算的研究,但这些研究基本都是基于DES、AES等国际公开算法的,针对国产商用密码算法SM4的研究还较少。文章在深入研究GPU并行计算机制的基础上,通过研究最优明文数据块、GPU存储类型和线程块对SM4加密的加速比问题,结合CPU与GPU的特性,提出一种GPU上并行SM4算法的最优加解密方案。结果表明,当明文数据块小于8 KB时,加速比(Ep)小于1;明文数据块大小为64 KB时,加速比开始大幅增加;明文数据块大小为256 KB时,加速比达到最大。当选择常量存储作为中间数据存储时,加密速度有所提升,对于大数据量、高速运算的需求来说,这种提升是很有必要的。最优线程块的大小为128~512(必须为32的倍数)个线程数。实验环境下,文章中实现的最优GPU加密方案的速度为普通CPU加密方案速度的26倍。
中图分类号:
李秀滢, 吉晨昊, 段晓毅, 周长春. GPU上SM4算法并行实现[J]. 信息网络安全, 2020, 20(6): 36-43.
LI Xiuying, JI Chenhao, DUAN Xiaoyi, ZHOU Changchun. Parallel Implementation of SM4 Algorithm on GPU[J]. Netinfo Security, 2020, 20(6): 36-43.
表1
CK参数
CKi | CKi+1 | CKi+2 | CKi+3 |
---|---|---|---|
0x00070e15 | 0x1c232a31 | 0x383f464d | 0x545b6269 |
0x70777e85 | 0x8c939aa1 | 0xa8afb6bd | 0xc4cbd2d9 |
0xe0e7eef5 | 0xfc030a11 | 0x181f262d | 0x343b4249 |
0x50575e65 | 0x6c737a81 | 0x888f969d | 0xa4abb2b9 |
0xc0c7ced5 | 0xdce3eaf1 | 0xf8ff060d | 0x141b2229 |
0x30373e45 | 0x4c535a61 | 0x686f767d | 0x848b9299 |
0xa0a7aeb5 | 0xbcc3cad1 | 0xd8dfe6ed | 0xf4fb0209 |
0x10171e25 | 0x2c333a41 | 0x484f565d | 0x646b7279 |
表2
不同大小数据块的加解密速度
数据块 | T1/ms | Tt/ms | Tp/ms | Tt+Tp/ms | Ep |
---|---|---|---|---|---|
2 KB | 0.536 | 0.457 | 1.602 | 2.059 | 0.260320544 |
4 KB | 1.0416 | 0.493 | 1.6035 | 2.0965 | 0.496828047 |
8 KB | 2.15 | 0.55 | 1.616 | 2.166 | 0.992613112 |
16 KB | 4.296 | 0.414 | 1.615 | 2.029 | 2.117299162 |
32 KB | 8.577 | 0.436 | 1.636 | 2.072 | 4.139478764 |
64 KB | 16.999 | 0.556 | 1.644 | 2.2 | 7.726818182 |
128 KB | 34.219 | 0.663 | 1.775 | 2.438 | 14.03568499 |
256 KB | 68.128 | 0.699 | 3.0505 | 3.7495 | 18.16988932 |
512 KB | 136.042 | 0.965 | 5.234 | 6.199 | 21.94579771 |
1 MB | 272.501 | 1.58 | 10.076 | 11.656 | 23.37860329 |
2 MB | 544.306 | 2.666 | 19.489 | 22.155 | 24.56808847 |
4 MB | 1098.776 | 4.939 | 38.039 | 42.978 | 25.56601052 |
8 MB | 2165.347 | 9.431 | 74.523 | 83.954 | 25.7920647 |
[1] | MANAVSKI S A. CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography [C]//IEEE. 2007 IEEE International Conference on Signal Processing and Communications(ICSPC 2007), November 24-27, 2007, Dubai, United Arab Emirates. New Jersey: IEEE, 2007: 65-68. |
[2] | HARRISON O, WALDRON J. Practical Symmetric Key Cryptography on Modern Graphics Hardware [C]//USENIX Association. USENIX Security Symposium, July 28-August 1, 2008, San Jose, CA, USA. San Jose: USENIX Association, 2008: 195-210. |
[3] | NISHIKAWA N, AMANO H, IWAI K. Implementation of Bitsliced AES Encryption on CUDA-enabled GPU [C]//Springer. International Conference on Network and System Security, August 21-23, 2017, Helsinki, Finland. Heidelberg: Springer, 2017: 273-287. |
[4] | WANG Demin, CHEN Da. High Speed Implementation of SM4 Encryption Algorithm Based on CUDA[J]. Journal of Shijiazhuang Institute of Railway Technology, 2017,16(1):59-63. |
王德民, 陈达. 基于CUDA的SM4加密算法高速实现[J]. 石家庄铁路职业技术学院学报, 2017,16(1):59-63. | |
[5] | AGOSTA G, BARENGHI A, DE SANTIS F, et al. Record Setting Software Implementation of DES Using CUDA [C]//IEEE. 2010 Seventh International Conference on Information Technology: New Generations, July 1, 2010, Las Vegas, Nevada, USA. New Jersey: IEEE, 2010: 748-755. |
[6] | LUKEN B P, OUYANG M, DESOKY A H. AES and DES Encryption with GPU[EB/OL]. https://www.researchgate.net/publication/220922662_AES_and_DES_Encryption_with_GPU, 2020 -1-22. |
[7] | NISHIKAWA N, IWAI K, KUROKAWA T. High-performance Symmetric Block Ciphers on Cuda [C]//IEEE. 2011 Second International Conference on Networking and Computing, November 30-December 2, 2011, Osaka, Japan. New Jersey: IEEE, 2011: 221-227. |
[8] | SINGH M, MAHAJAN S. Analysis of RSA Algorithm Using GPU Programming[J]. International Journal of Network Security & Its Applications, 2014,6(7):13-28. |
[9] | FAN Wenjun, CHEN Xudong, LI Xuefeng. Parallelization of RSA Algorithm Based on Compute Unified Device Architecture [C]//IEEE. The Ninth International Conference on Grid and Cloud Computing, November 1-5, 2010, Nanjing, Jiangsu, China. New Jersey: IEEE, 2010: 174-178. |
[10] | LIN Y S, LIN C Y, LOU D C. Efficient Parallel RSA Decryption Algorithm for Many-core GPUs with CUDA[EB/OL]. https://hgpu.org/?p=7861, 2020-1-22. |
[11] | MARUYAMA N, NUKADA A, MATSUOKA S. Software-based ECC for GPUs[EB/OL]. https://hgpu.org/?p=2928, 2020 -1-22. |
[12] | CHENG Juanjuan, ZHENG Fangyu, LIN Jingqiang, et al. High-performance Implementation of Curve25519 on GPU[J]. Netinfo Security, 2017(9):122-127. |
[13] | FAN Lingyan, ZHOU Meng, LUO Jianjun, et al. IC Design with Multiple Engines Running CBC Mode SM4 Algorithm[J]. Journal of Computer Research and Development, 2018,55(6):1247-1253. |
樊凌雁, 周盟, 骆建军, 等. 多引擎并行CBC模式的SM4算法的芯片级实现[J]. 计算机研究与发展, 2018,55(6):1247-1253. | |
[14] | FEI Xiongwei, LI Kenli, YANG Wangdong. Research and Implementation of GPU Parallel AES Algorithm Based on CTR Model[J]. Journal of Chinese Computer Systems, 2015,36(3):529-533. |
费雄伟, 李肯立, 阳王东. 基于CTR模式的GPU并行AES算法的研究与实现[J]. 小型微型计算机系统, 2015,36(3):529-533. | |
[15] | XIA Chunlin, ZHOU Deyun, ZHANG Kun. CUDA Based High-efficiency Implementation of AES Algorithm[J]. Application Research of Computers, 2013,30(6):1907-1909. |
夏春林, 周德云, 张堃. AES算法的CUDA高效实现方法[J]. 计算机应用研究, 2013,30(6):1907-1909. | |
[16] | LEE W K, GOI B M, PHAN R C W, et al. High Speed Implementation of Symmetric Block Cipher on GPU [C]//IEEE. 2014 International Symposium on Intelligent Signal Processing and Communication Systems(ISPACS), December 1-4, 2014, Kuching, Malaysia. New York: IEEE, 2015: 102-107. |
[17] | ZHENG F, PAN W, LIN J, et al. Exploiting the Potential of GPUs for Modular Multiplication in ECC [C]//Springer. 15th International Workshop on Information Security Applications, August 25-27, 2014, Jeju Island, Korea. Heidelberg: Springer, 2014: 295-306. |
[18] |
LEE W K, CHEONG H S, PHAN R C W, et al. Fast Implementation of Block Ciphers and PRNGs in Maxwell GPU Architecture[J]. Cluster Computing, 2016,19(1):335-347.
doi: 10.1007/s10586-016-0536-2 URL |
[1] | 陈颖, 陈长松, 胡红钢. SM4硬件电路的功耗分析研究[J]. 信息网络安全, 2018, 18(5): 52-58. |
[2] | 成娟娟, 郑昉昱, 林璟锵, 董建阔. Curve25519椭圆曲线算法GPU高速实现[J]. 信息网络安全, 2017, 17(9): 122-127. |
[3] | 王敏, 饶金涛, 吴震, 杜之波. SM4密码算法的频域能量分析攻击[J]. 信息网络安全, 2015, 15(8): 14-19. |
[4] | . 快速最小生成树 Sollin 求解算法[J]. , 2014, 14(7): 87-. |
[5] | . 基于混沌和比特级置乱的并行图像加密算法[J]. , 2014, 14(4): 40-. |
[6] | 张维统;张瑜. 浅谈海量数据搜索分析子系统在海关系统中的应用[J]. , 2012, 12(8): 0-0. |
[7] | 武鸿浩. CUDA并行计算技术在情报信息研判中的应用[J]. , 2012, 12(2): 0-0. |
[8] | 张光斌;谢维盛;吴鸿伟. 基于CUDA的多模式匹配技术[J]. , 2011, 11(9): 0-0. |
[9] | 沈晓华;周永华;杨凡;刘忆宁. 基于GPU的RAR口令字恢复系统研究[J]. , 2011, 11(11): 0-0. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 379
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
摘要 963
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||