Netinfo Security ›› 2022, Vol. 22 ›› Issue (4): 30-39.doi: 10.3969/j.issn.1671-1122.2022.04.004

Previous Articles     Next Articles

Log Compression Optimization Method Based on Parser Tree

LIU Jiqiang1(), HE Jiahao1, ZHANG Jiancheng2,3, HUANG Xuezhen4   

  1. 1. Department of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
    2. Shandong Computer Science Center, Jinan 250014, China
    3. Shandong Zhengzhong Information Technology Co.,Ltd, Jinan 250014, China
    4. The First Research Institute of the Ministry of Public Security, Beijing 100048, China
  • Received:2022-01-12 Online:2022-04-10 Published:2022-05-12
  • Contact: LIU Jiqiang E-mail:jqliu@bjtu.edu.cn

Abstract:

Information system log data is very important for security analysis, but its size is growing with each passing day, and efficient log data storage and auditing has become one of the key issues for information system security. Log data compression can reduce the huge overhead on log data storage, and has become a hot research topic in the field of log data. Traditional compression tools and algorithms work well for small-scale text processing, but are not applicable to large-scale log data generated by information systems; existing log compression algorithms achieve data compression by extracting log structures, but the compression rate and compression speed of the numerical variable part of log data are not significantly improved. This paper proposes a parser tree based log compression optimization method(TOLC), which extracts the corresponding log templates and performs template compression by constructing a parser tree using a parser, and then encodes and compresses the remaining variable parts. In this paper, TOLC is evaluated on five different types of large log datasets, and by comparing with other methods, TOLC achieves the highest compression ratio on all datasets and also shows good compression speed on large log datasets, and its overall performance is optimal.

Key words: parser tree, log compression, template extraction, numerical code, compression ratio

CLC Number: