信息网络安全 ›› 2020, Vol. 20 ›› Issue (2): 37-48.doi: 10.3969/j.issn.1671-1122.2020.02.006
收稿日期:
2019-10-27
出版日期:
2020-02-10
发布日期:
2020-05-11
作者简介:
作者简介:彭长根(1963—),男,贵州,教授,博士,主要研究方向为密码学与大数据安全;赵园园(1993—),女,河南,硕士研究生,主要研究方向为隐私保护与数据安全;樊玫玫(1981—),女,贵州,讲师,硕士,主要研究方向为密码学与信息安全。
基金资助:
PENG Changgen1,2,3, ZHAO Yuanyuan1,3(), FAN Meimei1
Received:
2019-10-27
Online:
2020-02-10
Published:
2020-05-11
摘要:
数据的私密性与可用性是隐私保护领域的核心问题。主成分分析差分隐私算法将高维数据的主成分降维与加噪融合,可有效对高维数据进行隐私保护并保持数据的高可用性。现有的主成分分析差分隐私保护算法依赖于皮尔逊相关系数,在隐私保护过程中仅能捕获高维数据的线性关系,且未考虑将差分预算在降维后的数据集上进行优化分配,导致算法普适性不足、数据效用不高。针对此问题,文章提出一种基于最大信息系数的主成分分析差分隐私(MIC-PCA-DP)数据发布算法。实验结果表明,文章所提出的隐私保护算法适用于线性关系、非线性关系、非函数依赖关系等数据关系的降维保持,其主成分降维可实现高效的原始数据信息承载;与经典的差分隐私算法以及PCA-based PPDP算法相比,在相同隐私保护强度下,对数据添加的噪声更小,能够有效保持数据的可用性。
中图分类号:
彭长根, 赵园园, 樊玫玫. 基于最大信息系数的主成分分析差分隐私数据发布算法[J]. 信息网络安全, 2020, 20(2): 37-48.
PENG Changgen, ZHAO Yuanyuan, FAN Meimei. A Differential Private Data Publishing Algorithm via Principal Component Analysis Based on Maximum Information Coefficient[J]. Netinfo Security, 2020, 20(2): 37-48.
[1] | DWORK C.Differential Privacy[C]//Springer. The 33rd International Colloquium on Automata, Languages, and Programming, July 9-16, 2006, Venice, Italy. Berlin: Spinger, 2006: 1-12. |
[2] | DWORK C, LEI J.Differential Privacy and Robust Statistics[C]//SIGACT. 41st ACM Symposium on Theory of Computing(STOC 2009), May 31-June 2, 2009, Bethesda, Maryland. New York: ACM, 2009: 371-380. |
[3] | HARDT M, ROTH A.Beating Randomized Response on Incoherent Matrices[C]//SIGACT. 44th ACM Symposium on Theory of Computing(STOC 2012), May 19-22, 2012, New York, USA. New York: ACM, 2012: 1255-1268. |
[4] | DWORK C, ROTH A.The Algorithmic Foundations of Differential Privacy[J]. Foundations and Trends in Theoretical Computer Science, 2013, 9(3-4): 211-407. |
[5] | BLUM A, DWORK C, MCSHERRY F, et al.Practical Privacy: The SULQ Framework[C]//SIGACT. 24th ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems(PODS 2005), June 13-15, 2005, Baltimore, Maryland. New York: ACM, 2005: 128-138. |
[6] | JIANG Xiacqiang, JI Zhanglong, WANG Shuang, et al.Differential-Private Data Publishing Through Component Analysis[J]. Transactions on Data Privacy, 2013, 6(1): 19-34. |
[7] | DWORK C, TALWAR K, THAKURTA A, et al.Analyze Gauss: Optimal Bounds for Privacy-Preserving Principal Component Analysis[C]//SIGACT. 46th ACM Symposium on Theory of Computing(STOC 2014), May 31-June 3, 2014, New York, USA. New York: ACM, 2014: 11-20. |
[8] | CHAUDHURI K, SARWATE A D, SINHA K.A Near-Optimal Algorithm for Differentially-Private Principal Components[J]. Journal of Machine Learning Research, 2013, 14(9): 2905-2943. |
[9] | KAPRALOV M, TALWAR K.On Differentially Private Low Rank Approximation[C]//SIGACT. 24th Annual ACM-SIAM Symposium on Discrete Algorithms(SODA 2013), January 6-8, 2013, New Orleans, Los Angeles, USA. New York: ACM, 2013: 1395-1414. |
[10] | HARDT M, ROTH A.Beyond Worst-Case Analysis in Private Singular Vector Computation[C]//SIGACT. 45th ACM Symposium on the Theory of Computing, June 1-4, 2013, Palo Alto, California, USA. New York: ACM, 2013: 331-340. |
[11] | JIANG Wuxuan, XIE Cong, ZHANG Zhihua.Wishart Mechanism for Differentially Private Principal Components Analysis[J]. Computer Science, 2015, 18(VI): 458-473. |
[12] | HARDT M, PRICE E.The Noisy Power Method: A Meta Algorithm with Applications[C]//NIPS. Annual Conference on Neural Information Processing Systems 2014, December 8-13, 2014, Montreal, Quebec, Canada. Boston: MIT Press, 2014: 2861-2869. |
[13] | HALKON, MARTINSSON P G, TROPP J A. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions[J]. SIAM Review, 2011, 53(2): 217-288. |
[14] | KALAI A, VEMPALA S.Efficient Algorithms for Online Decision Problems[J]. Journal of Computer and System Sciences, 2005, 71(3): 291-307. |
[15] | ZHANG Yuxuan, WEI Jianghong, LI Ji, et al.Graph Degree Histogram Publication Method with Node-Differential Privacy[J]. Journal of Computer Research and Development, 2019, 6(3): 508-520. |
张宇轩,魏江宏,李霁,等.点差分隐私下图数据的度直方图发布方法[J]. 计算机研究与发展,2019,6(3):508-520. | |
[16] | MCSHERRY F.Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis[C]//SIGACT. 2009 ACM SIGMOD International Conference on Management of Data, June 29-July 2, 2009, Providence, Rhode Island, USA. New York: ACM, 2009: 19-30. |
[17] | DWORK C.Differential Privacy: A Survey of Results[C]//Springer. 5th International Conference Theory and Applications of Models of Computation, April 25-29, Xi’an, China. Berlin: Spinger, 2008: 1-19. |
[18] | DWORK C, MCSHERRY F, NISSIM K, et al.Calibrating Data to Sensitivity in Private Data Analysis[C]// Springer. the 3rd Theory of Cryptography Conference, March 4-7, 2006, New York, USA. Berlin: Springer, 2006: 265-284. |
[19] | MCSHERRY F, TALWAR K.Mechanism Design via Differential Privacy[C]//IEEE. 48th Annual IEEE Symposium on Foundations of Computer Science, October 21-23, 2007, Providence, Rhode Island, USA. New York: IEEE, 2007: 94-103. |
[20] | RESHEF D N, RESHEF Y A, FINUCANE H K, et al.Detecting Novel Associations in Large Data Sets[J]. Science, 2011, 334(6062): 1518-1524. |
[21] | WOLD S.Principal Component Analysis[J]. Chemometrics & Intelligent Laboratory Systems, 1987, 2(1): 37-52. |
[22] | TIPPING M E, BISHOP C M.Probabilistic Principal Component Analysis[J]. Journal of the Royal Statistical Society Series B(Statistical Methodology), 1999, 61(3): 611-622. |
[23] | MOHAMMED N, CHEN R, FUNG B C M, et al. Differentially Private Data Release for Data Mining[C]//SIGACT. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24, 2011, San Diego, California, USA. New York: ACM, 2011: 493-501. |
[24] | KAISER, H F.The Application of Electronic Computers to Factor Analysis[J]. Educational and Psychological Measurement, 1960, 20(1): 141-151. |
[25] | Residential Building Data Set Data Set[EB/OL]. 2018-2-19. |
[1] | 蒋辰, 杨庚, 白云璐, 马君梅. 面向隐私保护的频繁项集挖掘算法[J]. 信息网络安全, 2019, 19(4): 73-81. |
[2] | 刘敬浩, 毛思平, 付晓梅. 基于ICA算法与深度神经网络的入侵检测模型[J]. 信息网络安全, 2019, 19(3): 1-10. |
[3] | 傅彦铭, 李振铎. 基于拉普拉斯机制的差分隐私保护k-means++聚类算法研究[J]. 信息网络安全, 2019, 19(2): 43-52. |
[4] | 杨威超, 郭渊博, 钟雅, 甄帅辉. 基于设备型号分类和BP神经网络的物联网流量异常检测[J]. 信息网络安全, 2019, 19(12): 53-63. |
[5] | 戚名钰, 刘铭, 傅彦铭. 基于PCA的SVM网络入侵检测研究[J]. 信息网络安全, 2015, 15(2): 15-18. |
[6] | . 基于两个方向二维核主成分分析的手指静脉识别[J]. , 2014, 14(4): 51-. |
[7] | 王文明;谭毓安. 基于EM聚类算法的机器人视觉场景深度分类方法[J]. , 2013, 13(6): 0-0. |
[8] | 张新超;董建锋;张婧. 主成份分析在网络流量异常检测中的应用研究[J]. , 2012, 12(1): 0-0. |
[9] | 郑辉. 基于KPCA组合核函数SVM的网络危险因素识别[J]. , 2010, (2): 0-0. |
[10] | 孙子文;周治平;李慧. 基于小波子带特征函数矩和主成分分析的图像隐写分析方法[J]. , 2009, 9(7): 0-0. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 311
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
摘要 845
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||