信息网络安全 ›› 2022, Vol. 22 ›› Issue (4): 30-39.doi: 10.3969/j.issn.1671-1122.2022.04.004
收稿日期:
2022-01-12
出版日期:
2022-04-10
发布日期:
2022-05-12
通讯作者:
刘吉强
E-mail:jqliu@bjtu.edu.cn
作者简介:
刘吉强(1973—),男,山东,教授,博士,主要研究方向为可信计算、隐私保护、云计算安全|何嘉豪(1997—),男,河南,硕士研究生,主要研究方向为区块链、数据安全存储|张建成(1973—),男,河南,副研究员,硕士,主要研究方向为密码技术、物联网安全技术|黄学臻(1984—),女,山西,工程师,博士,主要研究方向为隐私保护、数据安全
基金资助:
LIU Jiqiang1(), HE Jiahao1, ZHANG Jiancheng2,3, HUANG Xuezhen4
Received:
2022-01-12
Online:
2022-04-10
Published:
2022-05-12
Contact:
LIU Jiqiang
E-mail:jqliu@bjtu.edu.cn
摘要:
信息系统日志数据对安全分析非常重要,随着日志规模与日俱增,高效地进行日志数据存储和审计成为信息系统安全的关键问题之一。日志数据压缩能够减少对日志数据存储的巨大开销,已经成为日志数据领域的研究热点之一。传统的压缩工具、算法在小规模文本的处理上效果较好,但对于信息系统产生的大规模日志数据并不适用。现有日志压缩算法通过提取日志结构的方式实现数据压缩,但对日志数据中数值变量部分的压缩率和压缩速度的提升不明显。文章提出一种基于解析器树的日志压缩优化方法(TOLC),通过解析器构造解析器树,提取相应的日志模板并进行模板压缩,进而对数值变量部分进行编码压缩。文章通过5个不同类型的大型日志数据集对TOLC进行评估,并与其他方法进行比较。实验结果表明,TOLC在所有数据集上都实现了最高的压缩率,且在大型日志数据集中也表现出了很好的压缩速度,整体上表现最优。
中图分类号:
刘吉强, 何嘉豪, 张建成, 黄学臻. 基于解析器树的日志压缩优化方法[J]. 信息网络安全, 2022, 22(4): 30-39.
LIU Jiqiang, HE Jiahao, ZHANG Jiancheng, HUANG Xuezhen. Log Compression Optimization Method Based on Parser Tree[J]. Netinfo Security, 2022, 22(4): 30-39.
[1] | SAYOOD K. Introduction to Data Compression[M]. San Francisco: Morgan Kaufmann, 2017. |
[2] |
CLEARY J, WITTEN I. Data Compression Using Adaptive Coding and Partial String Matching[J]. IEEE Transactions on Communications, 1984, 32(4): 396-402.
doi: 10.1109/TCOM.1984.1096090 URL |
[3] | SKIBIŃSKI P, SWACHA J. Fast and Efficient Log File Compression[C]// Springer. CEUR Workshop Proceedings of the 11th East-European Conference on Advances in Databases and Information Systems(ADBIS). Heidelberg: Springer, 2007: 330-342. |
[4] | GRABOWSKI S, DEOROWICZ S. Web Log Compression[EB/OL]. [2021-11-22]. https://journals.bg.agh.edu.pl/AUTOMATYKA/2007-03/Auto36.pdf . |
[5] | DEOROWICZ S, GRABOWSKI S. Efficient Preprocessing for Web Log Compression[EB/OL]. [2021-11-22]. https://www.researchgate.net/publication/309732616_Efficient_Preprocessing_for_Web_log_compression . |
[6] | DEOROWICZ S, GRABOWSKI S. Sub-Atomic Field Processing for Improved Web Log Compression[C]// IEEE. 2008 International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science(TCSET). New Jersey: IEEE, 2008: 551-556. |
[7] | HÄTÖNEN K, BOULICAUT J F, KLEMETTINEN M, et al. Comprehensive Log Compression with Frequent Patterns[C]// Springer. International Conference on Data Warehousing and Knowledge Discovery. Heidelberg: Springer, 2003: 360-370. |
[8] | WANG Yanfeng, WANG Zheng, YAN Baoping. High-Efficient DNS Log Compression Algorithm[J]. Computer Engineering, 2010, 36(15): 32-35. |
王艳峰, 王正, 阎保平. 一种高效的DNS日志压缩算法[J]. 计算机工程, 2010, 36(15):32-35. | |
[9] | CHRISTENSEN R. Improving Compression of Massive Log Data[EB/OL]. [2021-11-22]. https://my.eng.utah.edu/-robertc/papers/uthesis-rc.pdf . |
[10] | JANG J H, LEE S M, KIM S D, et al. Accelerating Forex Trading System through Transaction Log Compression[C]// IEEE. 2014 International SoC Design Conference(ISOCC). New Jersey: IEEE, 2014: 74-75. |
[11] | HE Pinjia, ZHU Jieming, ZHENG Zibin, et al. Drain: An Online Log Parsing Approach with Fixed Depth Tree[C]// IEEE. 2017 IEEE International Conference on Web Services(ICWS). New Jersey: IEEE, 2017: 33-40. |
[12] | LIU Jinyang, ZHU Jieming, HE Shilin, et al. Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression[C]// IEEE. 34th IEEE/ACM International Conference on Automated Software Engineering(ASE). New Jersey: IEEE, 2019: 863-873. |
[13] |
MAKANJU A, ZINCIR-HEYWOOD A N, MILIOS E E. A Lightweight Algorithm for Message Type Extraction in System Application Logs[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 24(11): 1921-1936.
doi: 10.1109/TKDE.2011.138 URL |
[14] | MESSAOUDI S, PANICHELLA A, BIANCULLI D, et al. A Search-Based Approach for Accurate Identification of Log Message Formats[C]// IEEE. 26th International Conference on Program Comprehension(ICPC). New Jersey: IEEE, 2018: 167-177. |
[15] | VAARANDI R. A Data Clustering Algorithm for Mining Patterns from Event Logs[C]// IEEE. 3rd IEEE Workshop on IP Operations & Management(IPOM 2003). New Jersey: IEEE, 2003: 119-126. |
[16] | ZHU Jieming, HE Shilin, LIU Jinyang, et al. Tools and Benchmarks for Automated Log Parsing[C]// IEEE. 41st International Conference on Software Engineering: Software Engineering in Practice(ICSE-SEIP). New Jersey: IEEE, 2019: 121-130. |
[17] | FU Qiang, LOU Jianguang, WANG Yi, et al. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis[C]// IEEE. 9th IEEE International Conference on Data Mining. New Jersey: IEEE, 2009: 149-158. |
[18] | TANG Liang, LI Tao, PERNG C S. LogSig: Generating System Events from Raw Textual Logs[C]// ACM. 20th ACM International Conference on Information and Knowledge Management. New York: ACM, 2011: 785-794. |
[19] | MIZUTANI M. Incremental Mining of System Log Format[C]// IEEE. 2013 IEEE International Conference on Services Computing. New Jersey: IEEE, 2013: 595-602. |
[20] | SHIMA K. Length Matters: Clustering System Log Messages Using Length of Words[EB/OL]. [2021-11-22]. https://arxiv.org/pdf/1611.03213.pdf . |
[21] | HAMOONI H, DEBNATH B, XU Jianwu, et al. Logmine: Fast Pattern Recognition for Log Analytics[C]// ACM. 25th ACM International on Conference on Information and Knowledge Management. New York: ACM, 2016: 1573-1582. |
[22] | VAARANDI R. A Data Clustering Algorithm for Mining Patterns from Event Logs[C]// IEEE. 3rd IEEE Workshop on IP Operations & Management(IPOM 2003). New Jersey: IEEE, 2003: 119-126. |
[23] | NAGAPPAN M, VOUK M A. Abstracting Log Lines to Log Event Types for Mining Software System Logs[C]// IEEE. 7th IEEE Working Conference on Mining Software Repositories(MSR 2010). New Jersey: IEEE, 2010: 114-117. |
[24] | MAKANJU A A O, ZINCIR-HEYWOOD A N, MILIOS E E. Clustering Event Logs Using Iterative Partitioning[C]// ACM. 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 1255-1264. |
[25] | JIANG Zhenming, HASSAN A E, FLORA P, et al. Abstracting Execution Logs to Execution Events for Enterprise Applications(Short Paper)[C]// IEEE. 8th International Conference on Quality Software. New Jersey: IEEE, 2008: 181-186. |
[26] |
WITTEN I H, NEAL R M, CLEARY J G. Arithmetic Coding for Data Compression[J]. Communications of the ACM, 1987, 30(6): 520-540.
doi: 10.1145/214762.214771 URL |
[27] | DU Min, LI Feifei. Spell: Streaming Parsing of System Event Logs[C]// IEEE. 16th International Conference on Data Mining(ICDM). New Jersey: IEEE, 2016: 859-864. |
[28] | LIN Hao, ZHOU Jingyu, YAO Bin, et al. Cowic: A Column-Wise Independent Compression for Log Stream Analysis[C]// IEEE. 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. New Jersey: IEEE, 2015: 21-30. |
[1] | 郎波, 谢冲, 陈少杰, 刘宏宇. 基于多模态特征融合的Fast-Flux恶意域名检测方法[J]. 信息网络安全, 2022, 22(4): 20-29. |
[2] | 金波, 唐前进, 唐前临. CCF计算机安全专业委员会2022年网络安全十大发展趋势解读[J]. 信息网络安全, 2022, 22(4): 1-6. |
[3] | 张伟, 徐智刚, 陈云芳, 黄海平. 一种基于动态Docker的SDN蜜网设计与实现[J]. 信息网络安全, 2022, 22(4): 40-48. |
[4] | 王子恒, 吴涵, 解建国, 陈小明. 基于VLAN的超晶格密钥分发跨网实现[J]. 信息网络安全, 2022, 22(4): 49-57. |
[5] | 刘龙庚. 基于异构网络空管安全监控关联算法研究[J]. 信息网络安全, 2022, 22(4): 58-66. |
[6] | 吕国华, 胡学先, 杨明, 徐敏. 基于联邦随机森林的船舶AIS轨迹分类算法[J]. 信息网络安全, 2022, 22(4): 67-76. |
[7] | 唐明, 黎聪, 李永波, 岳天羽. RISC-V架构上的时间侧信道静态检测研究[J]. 信息网络安全, 2022, 22(4): 7-19. |
[8] | 吕凯欣, 李志慧, 黑吉辽, 宋云. 一类图存取结构的最优信息率计算[J]. 信息网络安全, 2022, 22(4): 77-85. |
[9] | 黄保华, 屈锡, 郑慧颖, 熊庭刚. 一种基于信用的拜占庭容错共识算法[J]. 信息网络安全, 2022, 22(4): 86-92. |
[10] | 冯景瑜, 时翌飞, 王腾. 智能电网中抗主次合谋攻击的群智频谱感知加固方案[J]. 信息网络安全, 2022, 23(3): 1-9. |
[11] | 李国旗, 洪晟, 兰雪婷, 张虹. 多旋翼无人机系统的信息安全参考模型[J]. 信息网络安全, 2022, 23(3): 10-19. |
[12] | 石润华, 王树豪, 李坤昌. V2G中一种轻量级的跨域双向认证方案[J]. 信息网络安全, 2022, 23(3): 20-28. |
[13] | 顾兆军, 杨睿, 隋翯. 面向网络架构的系统攻击面建模方法[J]. 信息网络安全, 2022, 23(3): 29-38. |
[14] | 冯光升, 张熠哲, 孙嘉钰, 吕宏武. 计算机系统漏洞自动化利用研究关键技术及进展[J]. 信息网络安全, 2022, 23(3): 39-52. |
[15] | 李莉, 李泽群, 李雪梅, 史国振. 基于交叉耦合电路的物理不可克隆函数FPGA实现[J]. 信息网络安全, 2022, 23(3): 53-61. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||