Netinfo Security ›› 2020, Vol. 20 ›› Issue (8): 71-80.doi: 10.3969/j.issn.1671-1122.2020.08.009

Previous Articles     Next Articles

Research on K-Nearest Neighbor High Speed Matching Algorithm in Network Intrusion Detection

XU Guotian()   

  1. Cyber Crime Investigation Department, Criminal Investigation Police University of China, Shenyang 110854, China
  • Received:2020-07-20 Online:2020-08-10 Published:2020-10-20
  • Contact: XU Guotian E-mail:459536384@qq.com

Abstract:

K-nearest neighbor matching algorithm is widely used in network intrusion detection. When the number of samples and feature dimensions increase significantly, the query efficiency of K-nearest neighbor matching algorithm based on Ball-tree structure decreases significantly and cannot meet the requirements of real-time detection. In order to solve this problem, this paper proposes a high-speed K-nearest neighbor matching algorithm based on "reduce tree". Firstly, the original sample set is effectively clipped to construct a minimum-scale "reduce tree" while ensuring that the "reduce tree" preserves the distribution morphology of the original sample set in multi-dimensional space to the greatest extent. Secondly, in K-nearest neighbor search, K_g(2≤K_g≤K ) initial nearest neighbor points are quickly located in the "reduce tree", and then K-nearest neighbor query is carried out on the search binary tree by using the spatial distance between the initial nearest neighbor points and the target point as the pruning radius. Compared with the original K-nearest neighbor matching algorithm, the initial nearest neighbor position of the improved algorithm is not fixed, but dynamically located around the target point, effectively shortening the pruning distance, more sample points are pruned and deleted in the query process, significantly reducing the calculation amount and improving the overall query efficiency. The experimental results show that the improved K-nearest neighbor high-speed matching algorithm maintains high query efficiency when processing high-dimensional and massive sample data, and the growth ratio of some sample sets reaches 93.81%.

Key words: intrusion detection, K-nearest-neighbor, matching algorithm, reduce tree, Euclidean distance

CLC Number: