信息网络安全 ›› 2025, Vol. 25 ›› Issue (4): 630-639.doi: 10.3969/j.issn.1671-1122.2025.04.011

• 专题论文:智能系统安全 • 上一篇    下一篇

基于自适应采样的机器遗忘方法

何可, 王建华, 于丹, 陈永乐()   

  1. 太原理工大学计算机科学与技术学院,太原 030024
  • 收稿日期:2025-01-09 出版日期:2025-04-10 发布日期:2025-04-25
  • 通讯作者: 陈永乐 chenyongle@tyut.edu.cn
  • 作者简介:何可(2002—),女,河南,硕士研究生,主要研究方向为人工智能安全|王建华(1995—),男,山西,讲师,博士,CCF会员,主要研究方向为人工智能安全|于丹(1983—),女,山西,讲师,博士,主要研究方向为物联网安全|陈永乐(1983—),男,山东,教授,博士,CCF高级会员,主要研究方向为物联网安全。
  • 基金资助:
    中央引导地方科技发展资金项目(YDZJSX2024C003);山西省科技成果转化引导专项(202304021301037)

Adaptive Sampling-Based Machine Unlearning Method

HE Ke, WANG Jianhua, YU Dan, CHEN Yongle()   

  1. School of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China
  • Received:2025-01-09 Online:2025-04-10 Published:2025-04-25

摘要:

随着人工智能技术的快速发展,智能系统在医疗、工业等多个领域得到广泛应用。然而,智能系统中存储的大量用户数据一旦遭受恶意攻击,将对用户隐私构成严重威胁。为保护用户数据隐私,许多国家已出台相关法律法规,以确保用户享有“被遗忘权”。机器遗忘技术通常分为精确遗忘和近似遗忘两类,旨在通过调整模型参数,从已训练好的模型中消除特定数据的影响。精确遗忘方法利用剩余数据重新训练模型实现遗忘,但其计算成本较高;近似遗忘方法则通过少量参数更新实现遗忘,然而现有方法存在遗忘性能不足、遗忘时间过长等问题。文章提出一种基于自适应采样的机器遗忘方法,该方法先对模型训练过程中的梯度进行采样,随后利用少量梯度信息完成遗忘,具有广泛的适用性,可适配多种机器遗忘方法。实验结果表明,“先采样后遗忘”策略显著提升了近似遗忘性能,同时将精确遗忘时间减少了约22.9%,近似遗忘时间减少了约38.6%。

关键词: 机器遗忘, 隐私保护, 自适应采样, 被遗忘权

Abstract:

With the rapid development of artificial intelligence technologies, intelligent systems have been widely applied in various fields such as healthcare and industry. However, once a large amount of user data stored in intelligent systems is maliciously attacked, it will pose a serious threat to user privacy. To protect user data privacy, many countries have introduced relevant laws and regulations to ensure “the right to be forgotten”. Machine unlearning methods are typically divided into exact unlearning and approximate unlearning, aims to adjust model parameters to remove the influence of specific data from a trained model. Exact unlearning methods use the remaining data to retrain the model to achieve unlearning, but this approach is computationally expensive. Approximate unlearning methods use a smaller number of parameter updates to achieve unlearning, but existing approximate unlearning methods suffer from issues such as poor unlearning performance and long unlearning times. This paper proposed an adaptive sampling-based machine unlearning method, the method first sampled the gradients during the model training process, and then used a small amount of gradient information to complete unlearning. It had wide applicability and could be adapted to various machine forgetting methods. The experimental results show that the “sample first, unlearn later” approach can effectively improve the performance of approximate unlearning, while reducing the time for exact unlearning by about 22.9% and the time for approximate unlearning by about 38.6%.

Key words: machine unlearning, privacy protection, adaptive sampling, right to be forgotten

中图分类号: