Research and Implementation on Hybrid Clustering Algorithm in Big Data Processing

doi:10.3969/j.issn.1671-1122.2015.04.008

Abstract

Abstract:

With the rapid development of information technology, the era of big data has arrived, analysis of the data has become the focus of research, data mining is to become a top priority, and has been extensively studied. This paper aims to study the clustering algorithm, puts forward a hybrid clustering algorithm which integrates the clustering algorithm based on partition and the clustering algorithm based on hierarchical. The algorithm can avoid the problem of randomly chosen initial cluster centers, and uses the clustering algorithm based on partition to initialize the data, then uses the clustering algorithm based on hierarchical to analysis the post-processed data from the bottom to the top, which can greatly enhance clustering speed. The algorithm can combine the advantages of this two kinds of traditional clustering algorithm, eliminate the deficiencies, achieve complementary advantages, and improve the operating efficiency of the algorithm without loss of accuracy. Finally, simulation experiments confirm the effectiveness and feasibility of the proposed algorithm through the R language tools.

Key words: big data, data mining, clustering algorithm, partitioning algorithm, hierarchical algorithm

CLC Number:

TP309

CHEN Xiao, ZHAO Jing-ling. Research and Implementation on Hybrid Clustering Algorithm in Big Data Processing[J]. Netinfo Security, 2015, 15(4): 45-49.

Figures/Tables 5

References 12

[1]	王赛芳,戴芳,王万斌,等. 基于初始聚类中心化的K-均值算法[J]. 计算机工程与科学,2010,32(10):105-107.
[2]	XIA S X, LI W C, ZHOU Y, et al.Improved k-means clustering algorithm[J]. Journal of Southeast University (English Edition), 2007, 23(3): 435-438.
[3]	ZHANG C, XIA S.K-means clustering algorithm with improved initial center[C]//Second International Workshop on, IEEE, 2009: 790-792.
[4]	GREEN R, STAFFELL I, VASILAKOS N.Divide and conquer k-means clustering of demand data allows rapid and accurate simulations of the British electricity system[J]. IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2014, 61(2): 251-260.
[5]	DIN W I S W, YAHYA S, TAIB M N, et al. MAP: The new clustering algorithm based on multitier network topology to prolong the lifetime of wireless sensor network[C]//Signal Processing & its Applications (CSPA), 2014 IEEE 10th International Colloquium on, IEEE, 2014: 173-177.
[6]	JI T, BAO X, WANG Y, et al.A Fuzzy K-modes-based Algorithm for Soft Subspace Clustering[C]//Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on, IEEE, 2011, 2: 1080-1084.
[7]	肖凯,魏菲,彭昌水. 基于R语言的数据挖掘在水环境管理中的应用[J]. 长江科学院院报,2012,29(9):91-94.
[8]	白阳椿,胡荣兴,邓永超. 浅谈R语言中OOP编程方法[J]. 计算机光盘软件与应用,2011,(16):191-192.
[9]	张毅,顾逸圣,王伟. 快速最小生成树Sollin求解算法[J]. 信息网络安全,2014,(7):87-91.
[10]	潘丽敏,吴军华,林萌,等. 融合多特征的中文关键词提取方法[J]. 信息网络安全,2014,(8):40-44.
[11]	刘文龙,李晖,金东勋. 数字指纹生成方案及关键算法研究[J]. 信息网络安全,2015,(2):66-70.
[12]	李海威,范博,李文锋. 一种可信虚拟平台构建方法的研究和改进[J]. 信息网络安全,2015,(1):1-5.

[1]	LIU Longgeng. Research on Association Algorithm of Heterogeneous Network Security Monitoring [J]. Netinfo Security, 2022, 22(4): 58-66.
[2]	GU Haiyan, JIANG Tong, MA Zhuo, ZHU Jipeng. Research of Improved k-Anonymity Algorithm and Its Application in Privacy Protection [J]. Netinfo Security, 2022, 22(10): 52-58.
[3]	LIU Hong, ZHANG Yuejin, ZHAO Wenxia, YANG Mu. A Security Management Framework for Data Sensitivity and Multidimensional Classification [J]. Netinfo Security, 2021, 21(10): 48-53.
[4]	SONG Yubo, GENG Yijin, LI Guyue, LI Tao. Identification of LoRa Device Based on Differential Constellation Trace Figure [J]. Netinfo Security, 2021, 21(1): 41-48.
[5]	LI Qiao, LONG Chun, WEI Jinxia, ZHAO Jing. A Hybrid Model of Intrusion Detection Based on LMDR and CNN [J]. Netinfo Security, 2020, 20(9): 117-121.
[6]	LANG Weimin, MA Weiguo, ZHANG Yin, YAO Jinfang. A Data Deduplication Scheme Supporting Dynamic Management of Data Ownership [J]. Netinfo Security, 2020, 20(6): 1-9.
[7]	ZHANG Jiacheng, PENG Jia, WANG Lei. A Graph Information Collection Method Based on Local Differential Privacy in Big Data Environment [J]. Netinfo Security, 2020, 20(6): 44-56.
[8]	JI Zhaoxuan, YANG Zhi, SUN Yu, SHAN Yiwei. GPU High Speed Implementation of SHA1 in Big Data Environment [J]. Netinfo Security, 2020, 20(2): 75-82.
[9]	HUANG Baohua, CHENG Qi, YUAN Hong, HUANG Pirong. K-means Clustering Algorithm Based on Differential Privacy with Distance and Sum of Square Error [J]. Netinfo Security, 2020, 20(10): 34-40.
[10]	Yongheng XIE, Yubo FENG, Qingfeng DONG, Mei WANG. Research on Data Ingestion Method Based on Deep Learning [J]. Netinfo Security, 2019, 19(9): 36-40.
[11]	Yi WEN, Xingshu CHEN, Xuemei ZENG, Yonggang LUO. DNS Protocol Restore System for Security Analysis Based on Large-scale Network [J]. Netinfo Security, 2019, 19(5): 77-83.
[12]	Leihua ZHANG, Hongtai NIU, Zhongni WANG, Xuehong LIU. Research on the Construction of Early Warning Model of Criminals Based on Big Data [J]. Netinfo Security, 2019, 19(4): 82-89.
[13]	Tianxiong WU, Xingshu CHEN, Yonggang LUO. Research and Implementation of Application Program Protection Mechanism under Big Data Platform [J]. Netinfo Security, 2019, 19(1): 68-75.
[14]	Ronglei HU, Yanqiong HE, Ping ZENG, Xiaohong FAN. Design and Implementation of Medical Privacy Protection Scheme in Big Data Environment [J]. Netinfo Security, 2018, 18(9): 48-54.
[15]	Xinyang FENG, Jianjing SHEN. A Yarn and NMF Based Big Data Clustering Algorithm [J]. Netinfo Security, 2018, 18(8): 43-49.