信息网络安全 ›› 2020, Vol. 20 ›› Issue (2): 66-74.doi: 10.3969/j.issn.1671-1122.2020.02.009

• • 上一篇    下一篇

基于数据增强和模型更新的异常流量检测技术

张浩1,2(), 陈龙1,2, 魏志强1,2   

  1. 1.福州大学数学与计算机科学学院,福州 350116
    2.福建省网络计算与智能信息处理重点实验室,福州 350116
  • 收稿日期:2019-10-10 出版日期:2020-02-10 发布日期:2020-05-11
  • 作者简介:

    作者简介:张浩(1981—),男,安徽,副教授,博士,主要研究方向为信息安全、数据分析、计算智能算法和启发式算法等;陈龙(1997—),男,湖北,硕士研究生,主要研究方向为网络安全、大数据分析;魏志强(1994—),男,福建,硕士研究生,主要研究方向为网络安全、机器学习。

  • 基金资助:
    国家自然科学基金海峡联合基金重点项目[U1705262];国家自然科学基金[61672159];福建省自然科学基金[2016J01754]

Abnormal Traffic Detection Technology Based on Data Augmentation and Model Update

ZHANG Hao1,2(), CHEN Long1,2, WEI Zhiqiang1,2   

  1. 1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
    2. Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou 350116, China
  • Received:2019-10-10 Online:2020-02-10 Published:2020-05-11

摘要:

网络攻击手段层出不穷,使得数据样本不断变化,导致异常检测精度低。传统网络异常流量检测方法通过规则匹配进行检测,该方法检测手段较简单,很难适应复杂灵活的大规模网络环境。为此,文章提出一种基于数据增强和模型更新的异常流量检测技术。为解决数据不平衡问题,文章引入SMOTE算法进行少数类样本的过采样,并结合ENN算法剔除噪音数据。通过随机森林算法提取样本特征的重要性,并在改进的KNN算法中以特征重要性作为距离度量实现模型更新。最后,采用带有分类特性的CatBoost分类算法对网络流量数据进行分类。该模型在模型迭代更新过程中,对异常流量的检测效果较好,与HCPTC-IDS等方法比较,检测精度和误报率都有所提升。利用KDD 99数据集进行实验的结果表明,该模型的多分类检测精度高达96.52%,并且误报率仅为0.92%。

关键词: 网络异常流量检测, 数据不平衡, 特征重要性, 模型更新, KDD 99

Abstract:

Due to the endless network attack means, the data samples are constantly changing, resulting in low accuracy of anomaly detection. The traditional network abnormal traffic detection method is detected by rule matching. The detection method is relatively simple, and it is difficult to adapt to a complex and flexible large-scale network environment. To this end, this paper proposes an abnormal traffic detection technology based on data augmentation and model update. In order to solve the problem of data imbalance, this paper introduces the SMOTE algorithm to oversample the minority samples, and removes the noise data with the ENN algorithm. The important features are extracted by the random forest algorithm, and the model update is implemented with the feature importance as the distance metric in the improved KNN algorithm. Finally, the CatBoost classification algorithm is used to classify network traffic data. In the model iterative update process, the detection of abnormal traffic is better. Compared with HCPTC-IDS, the detection accuracy and false positive rate are improved. The experimental results on the KDD 99 dataset show that the multi-classification detection accuracy of this model is as high as 96.52%, and the false positive rate is only 0.92%.

Key words: network abnormal traffic detection, data imbalance, character importance, model updating, KDD 99

中图分类号: