基于分层聚类的个性化联邦学习隐私保护框架

doi:10.3969/j.issn.1671-1122.2024.08.006

摘要/Abstract

摘要：

联邦学习作为一种新兴的隐私保护分布式机器学习框架，利用密码原语有效地解决了隐私泄露问题，如何在分布式环境中防止投毒攻击已成为联邦学习的研究热点。目前的研究工作大部分依赖于数据独立同分布情况，并使用明文进行恶意梯度识别，无法处理数据异构带来的挑战。为了解决上述问题，文章提出一个基于分层聚类的个性化联邦学习隐私保护框架。该框架基于坐标感知的中位数算法对梯度进行加密，并采用安全余弦相似度方案识别恶意梯度，通过层次聚合方法增强模型在独立同分布和非独立同分布场景下的鲁棒性。在MNIST、CIFAR-10和Fashion-MNIST三个公开数据集上的实验结果表明，该模型具有较强的隐私保护能力。与FedAVG、PPeFL、中位数、裁剪均值和聚类等算法相比，该模型准确率分别提升了14.90%、9.59%、29.50%、26.57%和23.19%。

关键词: 联邦学习, 层次聚合, 同态加密, 隐私保护

Abstract:

Federated learning (FL) is an emerging framework of privacy-preserving distributed machine learning that effectively deals with the privacy leakage problem by utilizing cryptographic primitives. However, how to prevent poisoning attacks in distributed situations has recently become a research hotspot FL concern. Currently, most existing works rely on an independently identical distribution situation and identify malicious gradients using plaintext, which cannot handle the data heterogeneity scenario challenges and imposes significant privacy leakage risks due to releasing unencrypted gradients. To address these challenges, this paper proposed a hierarchical clustering federated learning framework for personalized privacy-preserving. The framework exploited homomorphic encryption by employing the median coordinate as the benchmark. Subsequently, it employed a secure cosine similarity scheme to identify poisonous gradients, and it innovatively utilized clustering as part of the defense mechanism and developed a hierarchical aggregation that enhances the proposed mode’s robustness in IID and non-IID scenarios. Experimental results on the MNIST, CIFAR-10 and Fashion-MNIST datasets indicates that it has powerful privacy-preserving capabilities, and compared to existing defense strategies of FedAVG, PPeFL Media, Trimmed Mean and Clustering, the proposed method achieves an average improvement of 14.90%, 9.59%, 29.50%, 26.57% and 23.19% on accuracy, respectively.

Key words: federated learning, hierarchical aggregation, homomorphic encryption, privacy-preserving

中图分类号:

TP309

郭倩, 赵津, 过弋. 基于分层聚类的个性化联邦学习隐私保护框架[J]. 信息网络安全, 2024, 24(8): 1196-1209.

GUO Qian, ZHAO Jin, GUO Yi. Hierarchical Clustering Federated Learning Framework for Personalized Privacy-Preserving[J]. Netinfo Security, 2024, 24(8): 1196-1209.

图/表 13

图1

图2

图3

表1

图4

图5

表2

图6

图7

图8

图9

图10

表3

参考文献 31

[1]	MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data[C]// JMLR. 20th International Conference on Artificial Intelligence and Statistics. Burlington: JMLR, 2017: 1273-1282.
[2]	GEIPING J, BAUERMEISTER H, DRÖGE H, et al. Inverting Gradients-How Easy is It to Break Privacy in Federated Learning?[C]// MIT. The 34th Neural Information Processing Systems. Cambridge: MIT, 2020: 16937-16947.
[3]	YIN Hongxu, MALLYA A, VAHDAT A, et al. See through Gradients: Image Batch Recovery via GradInversion[C]// IEEE. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2021: 16332-16341.
[4]	BOENISCH F, DZIEDZIC A, SCHUSTER R, et al. When the Curious Abandon Honesty: Federated Learning is Not Private[C]// IEEE. 2023 IEEE 8th European Symposium on Security and Privacy(EuroS&P). New York: IEEE, 2023: 175-199.
[5]	DWORK C, ROTH A. The Algorithmic Foundations of Differential Privacy[J]. Foundations and Trends in Theoretical Computer Science, 2013, 9(3-4): 211-407.
[6]	TRAN V T, PHAM H H, WONG K S. Personalized Privacy-Preserving Framework for Cross-Silo Federated Learning[J]. IEEE Transactions on Emerging Topics in Computing, 2024(99): 1-12.
[7]	WANG Baocang, CHEN Yange, JIANG Hang, et al. PPeFL: Privacy-Preserving Edge Federated Learning with Local Differential Privacy[J]. IEEE Internet of Things Journal, 2023, 10(17): 15488-15500.
[8]	TUOR T, WANG Shiqiang, KO B J, et al. Overcoming Noisy and Irrelevant Data in Federated Learning[C]// IEEE. 2020 25th International Conference on Pattern Recognition(ICPR). New York: IEEE, 2021: 5020-5027.
[9]	SATTLER F, MULLER K R, SAMEK W. Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization under Privacy Constraints[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(8): 3710-3722.
[10]	CHEN Xiao, YU Haining, JIA Xiaohua, et al. APFed: Anti-Poisoning Attacks in Privacy-Preserving Heterogeneous Federated Learning[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 5749-5761.
[11]	DINH C T, TRAN N, NGUYEN J. Personalized Federated Learning with Moreau Envelopes[C]// MIT. 34th Neural Information Processing Systems(NeurIPS). Cambridge: MIT, 2020: 21394-21405.
[12]	PILLUTLA K, MALIK K, MOHAMED A, et al. Federated Learning with Partial Model Personalization[C]// ACM. 39th International Conference on Machine Learning(ICML). New York: ACM, 2022: 17716-17758.
[13]	LIU Wei, TANG Congke, MA Jie et al. A Federated Learning Model for Privacy Protection Based on Blockchain and Dynamic Evaluation[J]. Journal of Computer Research and Development, 2023, 60(11): 2583-2593.
	刘炜, 唐琮轲, 马杰, 等. 基于区块链和动态评估的隐私保护联邦学习模型[J]. 计算机研究与发展, 2023, 60(11): 2583-2593.
[14]	LIN Tao, KONG Lingjing, STICH S U, et al. Ensemble Distillation for Robust Model Fusion in Federated Learning[C]// MIT. 34th Neural Information Processing Systems(NeurIPS). Cambridge: MIT, 2020: 2351-2363.
[15]	MA Zhuoran, MA Jianfeng, MIAO Yubin, et al. ShieldFL: Mitigating Model Poisoning Attacks in Privacy-Preserving Federated Learning[J]. IEEE Transactions on Information Forensics and Security, 2022, 17(1): 1639-1654.
[16]	BAGDASARYAN E, VEIT A, HUA Yiqing, et al. How to Backdoor Federated Learning[C]// JMLR. 23th International Conference on Artificial Intelligence and Statistics. Burlington: JMLR, 2020: 2938-2948.
[17]	CAO Xiaoyu, JIA Jinyuan, GONG Zhenqiang. Provably Secure Federated Learning against Malicious Clients[C]// AAAI. 35th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 35(8): 6885-6893.
[18]	TOLPGIN V, TRUEX S, GURSOY M E, et al. Data Poisoning Attacks against Federated Learning Systems[C]// Springer. 25th European Symposium on Research in Computer Security. Heidelberg: Springer, 2020: 480-501.
[19]	CAO Xiaoyu, FANG Minghong, LIU Jia, et al. FLTrust: Byzantine-Robust Federated Learning via Trust Bootstrapping[C]// ISOC. 28th Annual Network and Distributed System Security Symposium. Reston: ISOC, 2021: 1-18.
[20]	CHEN Xiao, YU Haining, JIA Xiaohua, et al. APFed: Anti-Poisoning Attacks in Privacy-Preserving Heterogeneous Federated Learning[J]. IEEE Transactions on Information Forensics and Security, 2023, 18(1): 5749-5761.
[21]	HUANG Li, CUI Weiwei, ZHU Bin, et al. Visually Analysing the Fairness of Clustered Federated Learning with Non-IID Data[C]// IEEE. 2023 International Joint Conference on Neural Networks. New York: IEEE, 2023: 1-10.
[22]	CATALANO D, FIORE D. Using Linearly-Homomorphic Encryption to Evaluate Degree-2 Functions on Encrypted Data[C]// ACM. The 22nd ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2015: 1518-1529.
[23]	DENG Li. The Mnist Database of Handwritten Digit Images for Machine Learning Research[J]. IEEE Signal Processing Magazine, 2012, 29(6): 141-142.
[24]	KRIZHEVSKY A, HINTON G. Convolutional Deep Belief Networks on Cifar-10[J]. Unpublished Manuscript, 2010, 40(7): 1-9.
[25]	XIAO Han, RASUAL K, VOLLGRAF R. Fashion-Mnist: A Novel Image Dataset for Benchmarking Machine Learning Algorithms[EB/OL]. (2017-08-25)[2023-12-15]. https://doi.org/10.48550/arXiv.1708.07747.
[26]	CHEN Yudong, SU Lili, XU Jiaming. Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent[C]// ACM. 2018 ACM International Conference on Measurement and Modeling of Computer Systems. New York: ACM, 2017: 1-25.
[27]	YIN Dong, CHEN Yudong, RAMCHANDRAN K et al. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates[C]// ACM. 35th International Conference on Machine Learning. New York: ACM, 2018: 5650-5659.
[28]	BLANCHARD P, MHAMDI E M, GUERRAOUI R, et al. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent[C]// MIT. 30th Annual Conference on Neural Information Processing Systems. Cambridge: MIT, 2017: 119-129.
[29]	SATTLER F, MÜLLER K R, WIEGAND T, et al. On the Byzantine Robustness of Clustered Federated Learning[C]// IEEE. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE, 2020: 8861-8865.
[30]	PAILLIER P. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes[C]// Springer. 1999 International Conference on the Theory and Application of Cryptographic Techniques. Heidelberg: Springer, 1999: 223-238.
[31]	WONG T T. Generalized Dirichlet Distribution in Bayesian Analysis[J]. Applied Mathematics and Computation, 1998, 97(2-3): 165-181.

模型	数据集	non-IID				IID
模型	数据集	$\beta =0.5$	$\beta =5$	$\beta =10$	$\beta =100$	—
FedAVG	MNIST	79.46 %	83.17 %	88.51 %	91.25 %	95.67 %
	CIFAR-10	49.07 %	56.48 %	59.32 %	64.80 %	75.09 %
	Fashion-MNIST	59.40 %	65.47 %	68.14 %	70.39 %	72.67 %
PPeFL	MNIST	80.28 %	85.32 %	89.13 %	83.24 %	96.28 %
	CIFAR-10	53.19 %	60.59 %	65.47 %	73.42 %	85.67 %
	Fashion-MNIST	60.53 %	68.37 %	73.28 %	80.19 %	83.14 %
本文模型	MNIST	95.31 %	97.03 %	96.93 %	97.56 %	97.83 %
	CIFAR-10	67.04 %	75.64 %	79.21 %	82.45 %	89.24 %
	Fashion-MNIST	64.37 %	73.54 %	76.43 %	82.54 %	87.14 %

模型	数据集	学习率/轮	通信次数/次
中位数	MNIST	0.035/2	200
	CIFAR-10	0.035/2	1000
	Fashion-MNIST	0.035/2	1000
裁剪均值	MNIST	0.035/2	200
	CIFAR-10	0.035/2	1000
	Fashion-MNIST	0.035/2	1000
聚类	MNIST	0.035/2	200
	CIFAR-10	0.035/2	1000
	Fashion-MNIST	0.035/2	1000
本文模型	MNIST	0.035/2	200
	CIFAR-10	0.035/2	1000
	Fashion-MNIST	0.035/2	1000
模型	准确率（定向攻击）
模型	$A=20\%$	$A=35\%$	$A=50\%$
中位数	84.37 %	79.21 %	58.78 %
	53.26 %	43.16 %	23.76 %
	64.18 %	57.29 %	35.23 %
裁剪均值	89.03 %	82.35 %	61.34 %
	54.32 %	46.13 %	27.96 %
	65.83 %	60.29 %	37.14 %
聚类	88.46 %	85.57 %	64.17 %
	56.73 %	52.13 %	29.01 %
	66.18 %	63.24 %	41.47 %
本文模型	94.38 %	92.74 %	91.13 %
	85.15 %	81.98 %	74.58 %
	87.84 %	82.13 %	75.28 %
模型	准确率（非定向攻击）
模型	$A=20\%$	$A=35\%$	$A=50\%$
中位数	81.49 %	65.34 %	50.13 %
	56.07 %	51.79 %	27.39 %
	60.38 %	56.71 %	37.10 %
裁剪均值	84.37 %	67.47 %	50.07 %
	57.42 %	54.72 %	32.15 %
	62.57 %	58.23 %	40.17 %
聚类	87.04 %	83.09 %	68.51 %
	58.36 %	55.48 %	49.17 %
	64.37 %	60.29 %	50.38 %
本文模型	95.78 %	93.27 %	89.16 %
	84.45 %	75.49 %	71.24 %
	85.17 %	76.23 %	73.85 %

攻击	攻击率	准确率
攻击	攻击率	$L{{N}_{A}}=20\%$	$L{{N}_{A}}=35\%$	$L{{N}_{A}}=50\%$
非定向攻击	20%	91.46%	87.24%	83.17%
	35%	90.13%	83.23%	78.05%
	50%	85.42%	79.68%	68.53%
定向攻击	20%	91.14%	86.35%	80.74%
	35%	87.35%	83.07%	75.79%
	50%	83.27%	76.47%	65.43%