信息网络安全 ›› 2023, Vol. 23 ›› Issue (7): 74-85.doi: 10.3969/j.issn.1671-1122.2023.07.008

• 技术研究 • 上一篇    下一篇

基于稀疏自动编码器的可解释性异常流量检测

刘宇啸, 陈伟(), 张天月, 吴礼发   

  1. 南京邮电大学网络空间安全学院,南京 210023
  • 收稿日期:2022-12-20 出版日期:2023-07-10 发布日期:2023-07-14
  • 通讯作者: 陈伟 chenwei@njupt.edu.cn
  • 作者简介:刘宇啸(1999—),男,湖南,硕士研究生,CCF会员,主要研究方向为Web安全、异常流量检测|陈伟(1979—),男,江苏,教授,博士,CCF会员,主要研究方向为无线网络安全、移动互联网安全|张天月(1998—),女,江苏,硕士研究生,主要研究方向为机器学习、深度学习、异常流量检测|吴礼发(1968—),男,湖北,教授,博士,主要研究方向为软件安全漏洞挖掘和入侵检测
  • 基金资助:
    国家重点研发计划(2019YFB2101704)

Explainable Anomaly Traffic Detection Based on Sparse Autoencoders

LIU Yuxiao, CHEN Wei(), ZHANG Tianyue, WU Lifa   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2022-12-20 Online:2023-07-10 Published:2023-07-14

摘要:

目前许多深度学习检测模型在各项指标上达到较好的效果,但是由于安全管理者不理解深度学习模型的决策依据,导致一方面无法信任模型的判别结果,另一方面不能很好地诊断和追踪模型的错误,这极大地限制了深度学习模型在该领域的实际应用。面对这样的问题,文章提出了一个基于稀疏自动编码器的可解释性异常流量检测模型(Sparse Autoencoder Based Anomaly Traffic Detection,SAE-ATD)。该模型利用稀疏自动编码器学习正常流量特征,并在此基础上引入了阈值迭代选取最佳阈值,以提高模型的检测率。模型预测完毕后,将预测结果的异常值送入解释器中,通过解释器对参考值进行迭代更新后,返回每个特征参考值和异常值的差值,并结合原始数据进行可解释性分析。文章在CICIDS2017数据集和CIRA-CIC-DoHBrw-2020数据集上进行实验,实验结果表明SAE-ATD在两个数据集上对大部分攻击检测的精确率和召回率达到99%,且能给模型提供可解释性。

关键词: 异常流量检测, 自动编码器, 深度学习, 可解释性

Abstract:

Although many deep learning detection models achieve good results in various indicators, security managers do not understand the decision-making basis of deep models, on the one hand, they cannot trust the discrimination results of the model, and on the other hand, they cannot diagnose and track the errors of the model well, which greatly limit the practical application of deep learning models in this field. Faced with such a problem, this paper proposed a Sparse Autoencoder Based Anomaly Traffic Detection (SAE-ATD). The model used the sparse autoencoder to learn the normal traffic characteristics, and on this basis, a threshold was introduced to iteratively select the best threshold to improve the detection rate of the model. After the model was predicted, the outliers in the prediction results were fed into the explainer, and after iteratively updating the reference values through the explainer, the difference between each feature reference value and the outlier was returned, and interpretability analysis was carried out in combination with the original data. In this paper, experiments are carried out on the CICIDS2017 dataset and the CIRA-CIC-DoHBrw-2020 dataset, and the experimental results show that SAE-ATD has 99% accuracy and recall for most attacks detection on the two datasets, and can also provide explainability for the model.

Key words: anomaly traffic detection, autoencoder, deep learning, explainability

中图分类号: