基于拉普拉斯机制的差分隐私保护k-means++聚类算法研究

doi:10.3969/j.issn.1671-1122.2019.02.006

摘要/Abstract

摘要：

k-means++聚类算法是为了解决k-means聚类算法的准确度受其初始中心点选取的影响较大的问题而提出的,在聚类过程中,需要对相关的隐私数据提供保护。差分隐私模型定义了一种具有最大背景知识假设的攻击模型,并且能对隐私保护强度进行量化分析。文章提出一种基于拉普拉斯机制的差分隐私保护k-means++聚类算法（DPk-means++聚类算法）,在初始化选取中心点和迭代求均值中心点的过程中,分别根据拉普拉斯机制添加噪声,解决了k-means++聚类算法随机选取初始化中心点隐私泄露的问题和迭代求簇心隐私泄露问题。通过实验分别对隐私预算动态变化对比及聚类准确性结果进行分析,DPk-means++聚类算法能够在隐私预算参数范围内且保证聚类准确性的前提下,实现对数据隐私提供不同级别的保护。

关键词: 差分隐私保护, 拉普拉斯机制, k-means++, 聚类

Abstract:

The k-means++ clustering algorithm is proposed to solve the problem that the accuracy of the k-means clustering algorithm is greatly affected by the selection of its initial center point. In the clustering process, the related private data needs to be protected. The differential privacy model defines an attack model with the largest background knowledge and can quantify the privacy protection strength. This paper proposes a k-means++ clustering algorithm based on Laplace mechanism for differential privacy protection (DPk-means++ clustering algorithm), and in the process of initializing the selected center point and iterating the mean center point, the noise is added according to the Laplace mechanism, and the random selection initialization center of k-means++ clustering algorithm is solved. Point to privacy leaks and iterative clustering privacy issues. Comparative analysis of dynamic changes in privacy budgets and analysis of clustering accuracy results through experiments, the DPk-means++ clustering algorithm can provide different levels of protection for data privacy under the premise of privacy budget parameters and ensuring clustering accuracy.

Key words: differential privacy protection, Laplace mechanism, k-means++, clustering

中图分类号:

TP309

傅彦铭, 李振铎. 基于拉普拉斯机制的差分隐私保护k-means++聚类算法研究[J]. 信息网络安全, 2019, 19(2): 43-52.

Yanming FU, Zhenduo LI. Research on k-means++ Clustering Algorithm Based on Laplace Mechanism for Differential Privacy Protection[J]. Netinfo Security, 2019, 19(2): 43-52.

图/表 11

图1

图2

图3

图4

表1

图5

图6

图7

图8

图9

图10

参考文献 15

[1]	LU Tianliang, WANG Qiao, LIU Yingqing.Problems of User’s Privacy Leakage During Insecure Communication[J]. Netinfo Security, 2015, 15(9): 119-123.
	芦天亮,王侨, 刘颖卿. 不安全通信中的用户隐私泄露问题[J]. 信息网络安全,2015,15(9):119-123.
[2]	FANG Yuejian, ZHU Jinzhong, ZHOU Wen, et al.A Survey on Data Mining Privacy Protection Algorithms[J]. Netinfo Security, 2017, 17(2): 6-11.
	方跃坚,朱锦钟,周文,等. 数据挖掘隐私保护算法研究综述[J]. 信息网络安全,2017,17(2):6-11.
[3]	LIU Yahui, ZHANG Tieying, JIN Xiaolong, et al.Personal Privacy Protection in the Era of Big Data[J]. Journal of Computer Research and Development, 2015, 52(1): 1-19.
	刘雅辉,张铁赢,靳小龙,等. 大数据时代个人隐私保护[J]. 计算机研究与发展,2015,52(1):1-19.
[4]	HU Haibo, XU Jianliang, XU Xizhong, et al.Private Search on Key-value Stores with Hierarchical Indexes[C]//IEEE. IEEE 30th International Conference on Data Engineering, March 31-April 4, 2014, Chicago, IL, USA. New Jersey: IEEE, 2014: 628-639.
[5]	DWORK C.Differential Privacy[C]//Springer. 33rd International Conference on Automata, Languages and Programming, July 10-14, 2006, Venice, Italy. Heidelberg: Springer, 2006: 1-12.
[6]	WU Weimin, HUANG Huankun.A DP-DBScan Clustering Algorithm Based on Differential Privacy Preserving[J]. Computer Engineering and Science, 2015, 37(4): 830-834.
	吴伟民,黄焕坤. 基于差分隐私保护的DP—DBScan聚类算法研究[J]. 计算机工程与科学,2015,37(4):830-834.
[7]	MA Yinfang, ZHANG Lin.KDCK-medoids Dynamic Clustering Algorithm Based on Differential Privacy[J]. Computer Science, 2016, 43(z2): 368-372.
	马银方,张琳. 基于差分隐私保护的KDCK-medoids动态聚类算法[J]. 计算机科学,2016,43(z2):368-372.
[8]	ZHANG Yao, LI Shuyu, LI Zekun, et al.Differential Privacy Protection BIRCH Algorithm[J]. Journal of Southeast University(Natural Science Edition), 2017(s1): 140-144.
	张瑶,李蜀瑜,李泽堃,等. 差分隐私保护BIRCH算法[J]. 东南大学学报:自然科学版,2017(s1):140-144.
[9]	WANG Hao, XU Zhengquan.Differential privacy protection method for trajectory clustering[J]. Journal of Huazhong University of Science and Technology: Natural Science Edition, 2018, 46(1): 32-36.
	王豪,徐正全. 面向轨迹聚类的差分隐私保护方法[J]. 华中科技大学学报:自然科学版,2018,46(1):32-36.
[10]	LI Hongcheng, WU Xiaoping, CHEN Yan. k-means Clustering Method Preserving Differential Privacy in MapReduce Framework[J]. Journal on Communications, 2016, 37(2): 124-130.
	李洪成,吴晓平,陈燕. MapReduce框架下支持差分隐私保护的k-means聚类方法[J]. 通信学报,2016,37(2):124-130.
[11]	DWORK C, MCSHERRY F, NISSIM K, et al.Calibrating Noise to Sensitivity in Private Data Analysis[C]//Springer. 2006 Theory of Cryptography Conference, March 4-7, 2006, New York, NY, USA. Heidelberg: Springer, 2006: 265-284.
[12]	DANDEKAR P, FAWAZ N, IOANNIDIS S.Privacy Auctions for Recommender Systems[C]//Springer. 2012 International Workshop on Internet and Network Economics, December 10-12, 2012, Liverpool, United Kingdom. Heidelberg: Springer, 2012: 309-322.
[13]	GAO Zhiqiang, SUN Yixiao, CUI Xiaolong, et al.Privacy-Preserving Hybrid K-Means[J]. International Journal of Data Warehousing and Mining, 2018, 14(2): 1-17.
[14]	LI Yang, HAO Zhifeng, WEN Wen, et al.Research on Differential Privacy Preserving k-means Clustering[J]. Computer Science, 2013, 40(3): 287-290.
	李杨,郝志峰,温雯,等. 差分隐私保护k-means聚类方法研究[J]. 计算机科学,2013,40(3):287-290.
[15]	VISWANATH P.Histogranm-based Estimation Techniques in Databases[D]. Madison: University of Wisconsirr-Madison, 1997.

数据集名称	记录数/条	属性特征数	数据类型
User Knowledge Modeling Data Set	258	6	数值型
Occupancy Detection Data Set	20099	7	数值型