信息网络安全 ›› 2024, Vol. 24 ›› Issue (8): 1196-1209.doi: 10.3969/j.issn.1671-1122.2024.08.006

• 理论研究 • 上一篇    下一篇

基于分层聚类的个性化联邦学习隐私保护框架

郭倩1, 赵津2, 过弋1()   

  1. 1.华东理工大学信息科学与工程学院,上海 200237
    2.复旦大学计算机科学技术学院,上海 200433
  • 收稿日期:2024-01-28 出版日期:2024-08-10 发布日期:2024-08-22
  • 通讯作者: 过弋 guoyi@ecust.edu.cn
  • 作者简介:郭倩(1990—),女,安徽,博士研究生,主要研究方向为隐私计算和信息抽取|赵津(1993—),男,河南,博士研究生,主要研究方向为知识图谱和信息抽取|过弋(1975—),男,江苏,教授,博士,CCF会员,主要研究方向为文本挖掘和信息抽取。
  • 基金资助:
    上海市科学技术委员会科技计划项目(22DZ1204903);上海市科学技术委员会科技计划项目(22511104800)

Hierarchical Clustering Federated Learning Framework for Personalized Privacy-Preserving

GUO Qian1, ZHAO Jin2, GUO Yi1()   

  1. 1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
    2. School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2024-01-28 Online:2024-08-10 Published:2024-08-22

摘要:

联邦学习作为一种新兴的隐私保护分布式机器学习框架,利用密码原语有效地解决了隐私泄露问题,如何在分布式环境中防止投毒攻击已成为联邦学习的研究热点。目前的研究工作大部分依赖于数据独立同分布情况,并使用明文进行恶意梯度识别,无法处理数据异构带来的挑战。为了解决上述问题,文章提出一个基于分层聚类的个性化联邦学习隐私保护框架。该框架基于坐标感知的中位数算法对梯度进行加密,并采用安全余弦相似度方案识别恶意梯度,通过层次聚合方法增强模型在独立同分布和非独立同分布场景下的鲁棒性。在MNIST、CIFAR-10和Fashion-MNIST三个公开数据集上的实验结果表明,该模型具有较强的隐私保护能力。与FedAVG、PPeFL、中位数、裁剪均值和聚类等算法相比,该模型准确率分别提升了14.90%、9.59%、29.50%、26.57%和23.19%。

关键词: 联邦学习, 层次聚合, 同态加密, 隐私保护

Abstract:

Federated learning (FL) is an emerging framework of privacy-preserving distributed machine learning that effectively deals with the privacy leakage problem by utilizing cryptographic primitives. However, how to prevent poisoning attacks in distributed situations has recently become a research hotspot FL concern. Currently, most existing works rely on an independently identical distribution situation and identify malicious gradients using plaintext, which cannot handle the data heterogeneity scenario challenges and imposes significant privacy leakage risks due to releasing unencrypted gradients. To address these challenges, this paper proposed a hierarchical clustering federated learning framework for personalized privacy-preserving. The framework exploited homomorphic encryption by employing the median coordinate as the benchmark. Subsequently, it employed a secure cosine similarity scheme to identify poisonous gradients, and it innovatively utilized clustering as part of the defense mechanism and developed a hierarchical aggregation that enhances the proposed mode’s robustness in IID and non-IID scenarios. Experimental results on the MNIST, CIFAR-10 and Fashion-MNIST datasets indicates that it has powerful privacy-preserving capabilities, and compared to existing defense strategies of FedAVG, PPeFL Media, Trimmed Mean and Clustering, the proposed method achieves an average improvement of 14.90%, 9.59%, 29.50%, 26.57% and 23.19% on accuracy, respectively.

Key words: federated learning, hierarchical aggregation, homomorphic encryption, privacy-preserving

中图分类号: