Netinfo Security ›› 2020, Vol. 20 ›› Issue (10): 34-40.doi: 10.3969/j.issn.1671-1122.2020.10.005

Previous Articles     Next Articles

K-means Clustering Algorithm Based on Differential Privacy with Distance and Sum of Square Error

HUANG Baohua(), CHENG Qi, YUAN Hong, HUANG Pirong   

  1. School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China
  • Received:2020-05-12 Online:2020-10-10 Published:2020-11-25
  • Contact: HUANG Baohua E-mail:bhhuang66@gxu.edu.cn

Abstract:

K-means algorithm is simple, fast and easy to implement. It is widely used in the field of data mining, but it is easy to cause privacy leakage in the process of clustering. Differential privacy has a strict definition of privacy protection, and it can be used for quantitative analysis of privacy protection. In order to solve the problem that the K-means clustering algorithm based on differential privacy has blindness in the selection of the initial center points, which results in low clustering availability, a BDPK-means clustering algorithm is proposed. The algorithm uses the distance and the sum of squared errors within the cluster to select the reasonable initial center points for clustering. The theory proves that the algorithm satisfies the ε-differential privacy. Through simulation experiments, BDPK-means algorithm is compared with DPK-means algorithm under the same conditions, and the results show that BDPK-means algorithm can improve the availability of clustering.

Key words: privacy protection, data mining, differential privacy, K-means clustering, SSE

CLC Number: