信息网络安全 ›› 2016, Vol. 16 ›› Issue (6): 1-7.doi: 10.3969/j.issn.1671-1122.2016.06.001

• •    下一篇

Spark框架下基于无指导学习环境的网络流量异常检测研究与实现

吴晓平, 周舟(), 李洪成   

  1. 海军工程大学信息安全系,湖北武汉 430033
  • 收稿日期:2016-04-28 出版日期:2016-06-20 发布日期:2020-05-13
  • 作者简介:

    作者简介: 吴晓平(1961—),男,山西,教授,博士,主要研究方向为信息安全、密码学;周舟(1994—),男,云南,本科,主要研究方向为网络安全、并行计算、数据挖掘;李洪成(1991—),男,河南,博士研究生,主要研究方向为信息安全、数据挖掘。

  • 基金资助:
    国家自然科学基金[61100042];湖北省自然科学基金[2015CFC867]

Research and Implementation on Network Traffic Anomaly Detection without Guidance Learning with Spark

Xiaoping WU, Zhou ZHOU(), Hongcheng LI   

  1. Department of Information Security, Naval University of Engineering, Wuhan Hubei 430033, China
  • Received:2016-04-28 Online:2016-06-20 Published:2020-05-13

摘要:

针对海量数据进行入侵检测的困难性问题,文章设计并实现了一套基于Spark框架的网络流量无指导学习异常检测系统。数据的预处理采用Python和Python的数据升级版IPython实现,异常检测采用无指导学习环境下的快速聚类方法K-means预测以及划分流量方法,记录所代表的攻击类型。为了避免MapReduce等传统分布式计算框架频繁的硬盘读写带来的巨大时间开销,文章设计实现了Spark框架下的K-means异常检测方法,通过将每轮迭代产生的临时数据存入内存而非硬盘中,有效提高了K-means聚类检测算法的计算效率。此外,为解决K-means算法中K值选取难的问题,通过Spark迭代计算与比较不同K值下的K-means算法中各聚类中心到所属簇中所有点距离的平均值,实现最佳K值的选取。最后,对系统进行了性能和功能测试,测试结果表明该系统达到了预定的设计要求,具有很高的计算效率和检测准确性。

关键词: 网络流量检测, Spark, 指导学习

Abstract:

In view of the massive data intrusion detection, this paper designs and implements a network traffic anomaly detection system based on Spark framework. Data preprocessing use Python and Python data, an upgraded version of the IPython implementation. Anomaly detection uses K-means predict and classify flow records represent the type of attack. In order to avoid time overhead uses traditional distributed computing framework, this paper designs and implements an anomaly K-means detection method under the framework of Spark. The method storages temporary data into memory rather than the hard drive, and improve the computational efficiency. In order to solve the problem of K value select difficult, through the Spark iterative calculation and comparison of the different K-means value of the K algorithm in the cluster center to all points in the cluster average value of all points, to achieve the best selection of K value. Finally, the performance and function of the system are tested. The test result shows that the system achieves the predetermined design requirements, and has high computational efficiency and detection accuracy.

Key words: network traffic detection, Spark, guiding learning

中图分类号: