Netinfo Security ›› 2024, Vol. 24 ›› Issue (5): 767-777.doi: 10.3969/j.issn.1671-1122.2024.05.010

Previous Articles     Next Articles

An Automatic Discovery Method for Heuristic Log Templates

ZHANG Shuya1,2,3, CHEN Liangguo1,2,3, CHEN Xingshu1,2,3()   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Key Laboratory of Data Protection and Intelligent Management, Ministry of Education, Chengdu 610065, China
    3. Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
  • Received:2024-03-01 Online:2024-05-10 Published:2024-06-24
  • Contact: CHEN Xingshu E-mail:chenxsh@scu.edu.cn

Abstract:

Log is an important source of data in the field of security analytics. However, unstructured raw log can’t be used directly for security analysis, so parsing log into structured templates is a critical first step. Most of the existing log parsing methods assume that the log messages belonging to the same log template have the same log length, but the log messages belonging to the same template are incorrectly extracted into different templates due to the variable length of the log. Therefore, this paper proposed an automatic log template discovery method, KeyParse, which firstly calculated the similarity between logs and templates based on the longest common subsequence algorithm, so as to ignore the differential influence caused by variables, so as to achieve the matching of logs and templates. Secondly, the log template grouping was realized based on the highest frequency items to avoid the log messages belonging to the same event and different lengths being divided into different template groups, which reduced the template redundancy and improved the template matching efficiency. Finally, the HeavyGuardian algorithm was used to realize the statistics of the highest frequency items of streaming log messages. It solved the problem that the traditional frequency statistics method was difficult to adapt to the dynamic change of the word frequency of streaming log messages. Experimental results show that KeyParse has higher accuracy in the face of various types of log sets, with an average parsing accuracy of 0.968, and has higher performance when parsing large log sets.

Key words: log parsing, template grouping, template auto-discovery

CLC Number: