Netinfo Security ›› 2018, Vol. 18 ›› Issue (8): 56-63.doi: 10.3969/j.issn.1671-1122.2018.08.008

• Orginal Article • Previous Articles     Next Articles

Research on Hadoop-based Massive Security Log Clustering Algorithm

Xie LU1(), Shoushan LUO1, Yumei ZHANG2   

  1. 1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2. Chinese People’s Armed Police Force Corps of Tianjin, Tianjin 300001, China;
  • Received:2018-04-04 Online:2018-08-20 Published:2020-05-11

Abstract:

In the big data environment, network security incidents emerge one after another, and network security has become a hot spot of concern. As a dark data in the new environment, the security log records the important information of the running status of the equipment. Through its analysis, it can grasp the network security situation in real time, and can be used as a security auditing tool for pre-protection and after-accusation, to achieve abnormal events. Aiming at the importance of log auditing and combining the important role of data mining in the field of log analysis, and aiming at the relative lag of processing massive data in a single machine environment, a clustering algorithm based on Hadoop for massive security log is proposed. Firstly, the K-means clustering algorithm is improved based on the maximum and minimum distance (MMD) and the mean value, which overcomes the defect of the traditional K-means algorithm in finding the randomness of the initial cluster center. Secondly, in order to adapt to the massive data. Effectively process, improve the efficiency and speed of clustering, and deploy the improved K-means clustering algorithm on Map/Reduce for iterative calculation. Experiments show that the improved clustering algorithm proposed in this paper is better than other typical methods, and the clustering effect is stable. It has better running speed and speedup ratio in cluster performance.

Key words: security log, clustering, K-means, Map/Reduce, Hadoop

CLC Number: