Netinfo Security ›› 2018, Vol. 18 ›› Issue (8): 43-49.doi: 10.3969/j.issn.1671-1122.2018.08.006

• Orginal Article • Previous Articles     Next Articles

A Yarn and NMF Based Big Data Clustering Algorithm

Xinyang FENG1(), Jianjing SHEN2   

  1. 1. School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou Henan 450046, China
    2. PLA Strategic Support Force Information Engineering University, Zhengzhou Henan 450002, China
  • Received:2018-03-10 Online:2018-08-20 Published:2020-05-11

Abstract:

In order to improve the performance of MapReduce version 1 on big data processing, a Yarn and NMF (Non-negative Matrix Factorization) based Parallel hierarchical clustering algorithm was proposed in this paper. The combination of big data classification with NMF algorithm and the task partition in our MapReduce approach were discussed subsequently. Our approach used the Yarn distributed computation programming model of Hadoop2.0 and thus the big data was stored in HDFS (Hadoop Distributed File System). The coding mechanism and flow of hierarchical data clustering on Yarn were also discussed and described in detail. In order to demonstrate the efficiency of our approach, a serial of simulation experiments on a telecommunication big data were done. The results and performance analysis demonstrated that big data can be completed in an accepted time scope with Yarn framework. Good performance and speedup had been also obtained in our test.

Key words: cloud computing, big data, Yarn platform, non-negative matrix factorization, cluster algorithm

CLC Number: