信息网络安全 ›› 2020, Vol. 20 ›› Issue (8): 71-80.doi: 10.3969/j.issn.1671-1122.2020.08.009

• 技术研究 • 上一篇    下一篇

网络入侵检测中K近邻高速匹配算法研究

徐国天()   

  1. 中国刑事警察学院网络犯罪侦查系,沈阳 110854
  • 收稿日期:2020-07-20 出版日期:2020-08-10 发布日期:2020-10-20
  • 通讯作者: 徐国天 E-mail:459536384@qq.com
  • 作者简介:徐国天(1978—),男,辽宁,副教授,硕士,主要研究方向为网络安全、电子物证
  • 基金资助:
    辽宁省自然科学基金(2019-ZD-0167);辽宁省自然科学基金(20180550841);辽宁省自然科学基金(2015020091);中央高校基本科研业务费(3242017013);公安部技术研究计划课题(2016JSYJB06)

Research on K-Nearest Neighbor High Speed Matching Algorithm in Network Intrusion Detection

XU Guotian()   

  1. Cyber Crime Investigation Department, Criminal Investigation Police University of China, Shenyang 110854, China
  • Received:2020-07-20 Online:2020-08-10 Published:2020-10-20
  • Contact: XU Guotian E-mail:459536384@qq.com

摘要:

K近邻匹配算法被广泛应用于网络入侵检测工作,当样本数量和特征维度显著增加时,基于Ball-tree结构的K近邻匹配算法查询效率显著降低,不能满足实时性检测需求。针对这一问题,文章提出一种基于“裁剪树”的K近邻高速匹配算法。首先对全部样本集进行裁剪,构建一棵最小规模的“裁剪树”,同时保证“裁剪树”最大限度地保留原始样本集在多维空间内的分布形态;其次进行K近邻查找时,在“裁剪树”内快速定位K_g(2≤K_g≤K)个初始近邻点,利用初始近邻点与目标点的空间距离作为剪枝半径对搜索二叉树进行K近邻查询。与原始K近邻匹配算法相比,改进算法的初始近邻不在固定位置,而是动态位于目标点周边,有效缩短了剪枝距离,在查询过程中有更多的样本点被剪枝删除,显著减少计算量,提升了查询效率。实验结果表明,改进的K近邻高速匹配算法在处理高维、海量样本数据时,维持较高查询效率,部分样本集增速比达到93.81%。

关键词: 入侵检测, K近邻, 匹配, 裁剪树, 欧氏距离

Abstract:

K-nearest neighbor matching algorithm is widely used in network intrusion detection. When the number of samples and feature dimensions increase significantly, the query efficiency of K-nearest neighbor matching algorithm based on Ball-tree structure decreases significantly and cannot meet the requirements of real-time detection. In order to solve this problem, this paper proposes a high-speed K-nearest neighbor matching algorithm based on "reduce tree". Firstly, the original sample set is effectively clipped to construct a minimum-scale "reduce tree" while ensuring that the "reduce tree" preserves the distribution morphology of the original sample set in multi-dimensional space to the greatest extent. Secondly, in K-nearest neighbor search, K_g(2≤K_g≤K ) initial nearest neighbor points are quickly located in the "reduce tree", and then K-nearest neighbor query is carried out on the search binary tree by using the spatial distance between the initial nearest neighbor points and the target point as the pruning radius. Compared with the original K-nearest neighbor matching algorithm, the initial nearest neighbor position of the improved algorithm is not fixed, but dynamically located around the target point, effectively shortening the pruning distance, more sample points are pruned and deleted in the query process, significantly reducing the calculation amount and improving the overall query efficiency. The experimental results show that the improved K-nearest neighbor high-speed matching algorithm maintains high query efficiency when processing high-dimensional and massive sample data, and the growth ratio of some sample sets reaches 93.81%.

Key words: intrusion detection, K-nearest-neighbor, matching algorithm, reduce tree, Euclidean distance

中图分类号: