信息网络安全 ›› 2025, Vol. 25 ›› Issue (4): 610-618.doi: 10.3969/j.issn.1671-1122.2025.04.009

• 专题论文:智能系统安全 • 上一篇    下一篇

基于参数语义的日志解析方法

邢瀚韬1,2,3, 阮树骅1,2,3(), 陈良国1,2,3, 曾雪梅2,3   

  1. 1.四川大学网络空间安全学院,成都 610065
    2.数据安全防护与智能治理教育部重点实验室,成都 610065
    3.四川大学网络空间安全研究院,成都 610065
  • 收稿日期:2024-12-30 出版日期:2025-04-10 发布日期:2025-04-25
  • 通讯作者: 阮树骅 ruanshuhua@scu.edu.cn
  • 作者简介:邢瀚韬(2000—),男,山西,硕士研究生,主要研究方向为数据安全|阮树骅(1966—),女,四川,副教授,硕士,主要研究方向为云计算与大数据安全、区块链安全|陈良国(1993—),男,贵州,博士研究生,主要研究方向为大数据和网络安全|曾雪梅(1976—),女,四川,工程师,博士,主要研究方向为网络威胁检测、网络行为分析
  • 基金资助:
    中央高校基本科研业务费专项资金(SCU2024D012);四川大学理工学科内涵发展项目(2020SCUNG129)

Log Parsing Method Based on Semantic of Parameters

XING Hantao1,2,3, RUAN Shuhua1,2,3(), CHEN Liangguo1,2,3, ZENG Xuemei2,3   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Key Laboratory of Data Protection and Intelligent Management, Chengdu 610065, China
    3. Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
  • Received:2024-12-30 Online:2025-04-10 Published:2025-04-25

摘要:

现代信息系统规模日益扩大,通过分析结构各异的多源日志可以快速了解系统行为。日志参数的语义表征了系统中的实体信息,对实现多源日志的联合分析至关重要。但现有解析方法对日志参数的语义特征捕捉不足,存在语义缺失、语义覆盖范围不广、语义识别准确率不足等问题。因此,文章提出一种基于参数语义的日志解析方法(PS-Parser),该方法通过构建BERT模型捕捉日志上下文语义特征,提取日志参数的语义,并通过常规参数语义特征库,补全日志参数不同层次的语义,最终根据参数语义表征系统实体,实现多源日志联合分析。文章在6个多源真实数据集上进行实验,日志参数解析的平均准确率为94.7%,平均语义覆盖率为81.7%,语义解析的平均F1分数为0.991,相较于现有方法有显著提升,验证了所提方法的有效性。最后,针对大数据系统下的日志分析场景,验证了基于参数语义的日志解析方法对多源日志联合分析工作的支持作用。

关键词: 日志解析, 参数语义提取, 多源日志分析

Abstract:

Modern information systems are increasingly large, and their behavior is reflected in diverse multi-source logs. The semantics of log parameters represent entity information within the system, which is crucial for the joint analysis of multi-source logs. However, existing parsing methods inadequately capture the semantic features of log parameters, leading to issues such as semantic gaps, limited coverage, and insufficient accuracy in semantic recognition. To address this, this paper proposed a parameter semantics-based log parsing method, (PS-Parser), which captured the semantic features of log context using a BERT model, extracted the semantics of log parameters, and complemented the semantics at different levels through a conventional parameter semantic feature library. Ultimately, it represented system entities based on parameter semantics to achieve joint analysis of multi-source logs. Experiments on six multi-source real datasets show an average accuracy of 94.7% for log parameter parsing, an average semantic coverage of 81.7%, and an average F1 score of 0.991 for semantic parsing, significantly improving upon existing methods and validating the effectiveness of the proposed approach. Finally, the support of the parameter semantics-based log parsing method for joint analysis of multi-source logs in big data system scenarios is verified.

Key words: log parsing, semantic of parameters extraction, multi-source log analysis

中图分类号: