Netinfo Security ›› 2020, Vol. 20 ›› Issue (2): 37-48.doi: 10.3969/j.issn.1671-1122.2020.02.006

• 技术研究 • Previous Articles     Next Articles

A Differential Private Data Publishing Algorithm via Principal Component Analysis Based on Maximum Information Coefficient

PENG Changgen1,2,3, ZHAO Yuanyuan1,3(), FAN Meimei1   

  1. 1. State Key Laboratory of Public Big Data, College of Mathematics and Statistics, Guizhou University, Guiyang 550025, China
    2. College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
    3. Institute of Cryptography & Data Security, Guzihou University, Guiyang 550025, China
  • Received:2019-10-27 Online:2020-02-10 Published:2020-05-11

Abstract:

The privacy and availability of data are important issues of privacy protection. Principal component analysis (PCA) differential privacy can effectively protect the privacy of high-dimensional data and maintain the high availability of data by integrating the dimensionality reduction and noise addition of the principal component of high-dimensional data. The existing principal component analysis differential privacy protection algorithm relies on Pearson correlation coefficient, which can only capture the linear relationship of high-dimensional data in the privacy protection process, and does not consider to optimize the allocation of difference budget on the data set after dimension reduction, resulting in insufficient applicability of the algorithm and low data utility. To this end, a differential private data publishing algorithm (MIC-PCA-DPPD) is proposed via principal component analysis based on maximum information coefficient. The experimental results show that the proposed privacy protection algorithm is suitable for dimensionality maintenance of linear relationships, nonlinear relationships, multi-function relationships, etc. Its principal component dimensionality reduction can achieve efficient raw data information bearing. Compared with the classic differential privacy algorithm and PCA-based PPDP algorithm, the noise added to the data is smaller and the data availability can be effectively maintained, under the same constraint of privacy protection intensity.

Key words: differential privacy, principal components analysis, dimension reduction, data publication

CLC Number: