Netinfo Security ›› 2015, Vol. 15 ›› Issue (4): 45-49.doi: 10.3969/j.issn.1671-1122.2015.04.008

Previous Articles     Next Articles

Research and Implementation on Hybrid Clustering Algorithm in Big Data Processing

CHEN Xiao(), ZHAO Jing-ling   

  1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2015-01-12 Online:2015-04-10 Published:2018-07-16

Abstract:

With the rapid development of information technology, the era of big data has arrived, analysis of the data has become the focus of research, data mining is to become a top priority, and has been extensively studied. This paper aims to study the clustering algorithm, puts forward a hybrid clustering algorithm which integrates the clustering algorithm based on partition and the clustering algorithm based on hierarchical. The algorithm can avoid the problem of randomly chosen initial cluster centers, and uses the clustering algorithm based on partition to initialize the data, then uses the clustering algorithm based on hierarchical to analysis the post-processed data from the bottom to the top, which can greatly enhance clustering speed. The algorithm can combine the advantages of this two kinds of traditional clustering algorithm, eliminate the deficiencies, achieve complementary advantages, and improve the operating efficiency of the algorithm without loss of accuracy. Finally, simulation experiments confirm the effectiveness and feasibility of the proposed algorithm through the R language tools.

Key words: big data, data mining, clustering algorithm, partitioning algorithm, hierarchical algorithm

CLC Number: