信息网络安全 ›› 2019, Vol. 19 ›› Issue (2): 43-52.doi: 10.3969/j.issn.1671-1122.2019.02.006

• 技术研究 • 上一篇    下一篇

基于拉普拉斯机制的差分隐私保护k-means++聚类算法研究

傅彦铭, 李振铎()   

  1. 广西大学计算机与电子信息学院,广西南宁 530004
  • 收稿日期:2018-12-20 出版日期:2019-02-10 发布日期:2020-05-11
  • 作者简介:

    作者简介:傅彦铭(1976—),男,广西,副教授,博士,主要研究方向为人工智能与信息安全;李振铎(1990—),男,河南,硕士研究生,主要研究方向为人工智能与信息安全。

  • 基金资助:
    国家自然科学基金[61662004]

Research on k-means++ Clustering Algorithm Based on Laplace Mechanism for Differential Privacy Protection

Yanming FU, Zhenduo LI()   

  1. School of Computer and Electronic Information, Guangxi University, Nanning Guangxi 530004, China
  • Received:2018-12-20 Online:2019-02-10 Published:2020-05-11

摘要:

k-means++聚类算法是为了解决k-means聚类算法的准确度受其初始中心点选取的影响较大的问题而提出的,在聚类过程中,需要对相关的隐私数据提供保护。差分隐私模型定义了一种具有最大背景知识假设的攻击模型,并且能对隐私保护强度进行量化分析。文章提出一种基于拉普拉斯机制的差分隐私保护k-means++聚类算法(DPk-means++聚类算法),在初始化选取中心点和迭代求均值中心点的过程中,分别根据拉普拉斯机制添加噪声,解决了k-means++聚类算法随机选取初始化中心点隐私泄露的问题和迭代求簇心隐私泄露问题。通过实验分别对隐私预算动态变化对比及聚类准确性结果进行分析,DPk-means++聚类算法能够在隐私预算参数范围内且保证聚类准确性的前提下,实现对数据隐私提供不同级别的保护。

关键词: 差分隐私保护, 拉普拉斯机制, k-means++, 聚类

Abstract:

The k-means++ clustering algorithm is proposed to solve the problem that the accuracy of the k-means clustering algorithm is greatly affected by the selection of its initial center point. In the clustering process, the related private data needs to be protected. The differential privacy model defines an attack model with the largest background knowledge and can quantify the privacy protection strength. This paper proposes a k-means++ clustering algorithm based on Laplace mechanism for differential privacy protection (DPk-means++ clustering algorithm), and in the process of initializing the selected center point and iterating the mean center point, the noise is added according to the Laplace mechanism, and the random selection initialization center of k-means++ clustering algorithm is solved. Point to privacy leaks and iterative clustering privacy issues. Comparative analysis of dynamic changes in privacy budgets and analysis of clustering accuracy results through experiments, the DPk-means++ clustering algorithm can provide different levels of protection for data privacy under the premise of privacy budget parameters and ensuring clustering accuracy.

Key words: differential privacy protection, Laplace mechanism, k-means++, clustering

中图分类号: