信息网络安全 ›› 2015, Vol. 15 ›› Issue (4): 45-49.doi: 10.3969/j.issn.1671-1122.2015.04.008

• 技术研究 • 上一篇    下一篇

大数据处理中混合型聚类算法的研究与实现

陈晓(), 赵晶玲   

  1. 北京邮电大学计算机学院,北京 100876
  • 收稿日期:2015-01-12 出版日期:2015-04-10 发布日期:2018-07-16
  • 作者简介:

    作者简介: 陈晓(1988-),男,山东,硕士研究生,主要研究方向:多媒体通信、移动互联网、网络安全;赵晶玲(1963-),女,北京,副教授,博士,主要研究方向:多媒体通信、移动互联网、网络安全。

Research and Implementation on Hybrid Clustering Algorithm in Big Data Processing

CHEN Xiao(), ZHAO Jing-ling   

  1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2015-01-12 Online:2015-04-10 Published:2018-07-16

摘要:

随着信息技术的飞速发展,大数据时代已经来临,对数据的分析与处理成为目前研究的重点,数据挖掘技术更是成为了重中之重,被广泛研究与应用。文章在研究聚类算法的基础上,具体研究了基于划分的聚类算法以及自下而上的基于层次的聚类算法,通过将两种算法优化后再进行融合提出了一种混合型聚类算法。该算法能够避免划分算法中随机选取初始聚类中心的问题,使用基于划分的聚类算法对数据集进行初始化,然后对处理后的数据集进行自下而上的基于层次的聚类分析,最终能够得到理想的分析结果。该算法能够综合两类传统聚类算法的优点,摒除不足之处,做到优势互补,在不损失准确性的基础上提高了算法的运行效率。最后通过R语言工具进行实验仿真,证实了文中提出的混合型聚类算法的有效性以及可行性。

关键词: 大数据, 数据挖掘, 聚类算法, 划分算法, 层次算法

Abstract:

With the rapid development of information technology, the era of big data has arrived, analysis of the data has become the focus of research, data mining is to become a top priority, and has been extensively studied. This paper aims to study the clustering algorithm, puts forward a hybrid clustering algorithm which integrates the clustering algorithm based on partition and the clustering algorithm based on hierarchical. The algorithm can avoid the problem of randomly chosen initial cluster centers, and uses the clustering algorithm based on partition to initialize the data, then uses the clustering algorithm based on hierarchical to analysis the post-processed data from the bottom to the top, which can greatly enhance clustering speed. The algorithm can combine the advantages of this two kinds of traditional clustering algorithm, eliminate the deficiencies, achieve complementary advantages, and improve the operating efficiency of the algorithm without loss of accuracy. Finally, simulation experiments confirm the effectiveness and feasibility of the proposed algorithm through the R language tools.

Key words: big data, data mining, clustering algorithm, partitioning algorithm, hierarchical algorithm

中图分类号: