Netinfo Security ›› 2020, Vol. 20 ›› Issue (6): 36-43.doi: 10.3969/j.issn.1671-1122.2020.06.005

Previous Articles     Next Articles

Parallel Implementation of SM4 Algorithm on GPU

LI Xiuying1,2, JI Chenhao1,2, DUAN Xiaoyi1(), ZHOU Changchun1,2   

  1. 1. Department of Electronic and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070, China
    2. State Key Laboratory of Cryptology, Beijing 100878, China
  • Received:2020-04-11 Online:2020-06-10 Published:2020-10-21
  • Contact: DUAN Xiaoyi E-mail:duanxiaoyi@besti.edu.cn

Abstract:

The speed of cryptographic algorithm is proportional to the calculation force. In order to improve the speed of cryptographic algorithm, scholars achieve their goals by increasing CPU speed, using hardware encryption card and other solutions. With the wide application of GPU in the field of high-performance parallel computing, scholars have carried out research on GPU accelerated cryptographic algorithm. Most of these researches are focused on the international open algorithms such as DES and AES, and the research on SM4 of domestic commercial cryptographic algorithm is still rare. On the basis of in-depth study of GPU parallel computer system, the author presents an optimal encryption and decryption scheme for GPU parallel SM4 algorithm by studying the optimal plaintext block, GPU storage type and thread block's speed ratio for SM4 encryption, and combining the characteristics of CPU and GPU. The experimental results are as follows. When the plaintext data block is less than 8 KB, the acceleration ratio (EP) is less than 1. When the plaintext block size is 64 KB, the acceleration ratio starts to increase significantly, and reaches the maximum at 256 KB. When constant storage is selected as the intermediate data storage, the encryption speed is improved, which is necessary for the demand of large data and high-speed operation. The optimal size of thread block is 128~512(must be a multiple of 32) threads. In the experimental environment given in this paper, the optimal GPU encryption scheme can be implemented 26 times faster than the ordinary CPU encryption scheme.

Key words: GPU, parallel operation, CUDA, SM4

CLC Number: