信息网络安全 ›› 2020, Vol. 20 ›› Issue (10): 19-26.doi: 10.3969/j.issn.1671-1122.2020.10.003

• 技术研究 • 上一篇    下一篇

基于k匿名的准标识符属性个性化实现算法研究

何泾沙, 杜晋晖(), 朱娜斐   

  1. 北京工业大学信息学部,北京 100124
  • 收稿日期:2020-07-03 出版日期:2020-10-10 发布日期:2020-11-25
  • 通讯作者: 杜晋晖 E-mail:1290344719@qq.com
  • 作者简介:何泾沙(1961—),男,陕西,教授,博士,主要研究方向为网络安全、测试与分析和云计算|杜晋晖(1994—),男,山西,硕士研究生,主要研究方向为网络安全、隐私保护|朱娜斐(1981—),女,河南,副教授,博士,主要研究方向为网络安全、隐私保护和区块链
  • 基金资助:
    国家自然科学基金(61602456)

Research on k-anonymity Algorithm for Personalized Quasi-identifier Attributes

HE Jingsha, DU Jinhui(), ZHU Nafei   

  1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • Received:2020-07-03 Online:2020-10-10 Published:2020-11-25
  • Contact: DU Jinhui E-mail:1290344719@qq.com

摘要:

k匿名在很大程度上能够解决隐私保护领域中的链路攻击问题,但现有的k匿名模型并不重视个人隐私自治。现有的改进k匿名模型不能满足不同的人对不同类型数据的需求,在进行数据表发布后,整个表仍然只有一个k值,即所有元组都统一泛化,不能反映出用户个性化的隐私要求,产生较大的信息损失。文章在k匿名模型的基础上,结合基于聚类的泛化思想,提出基于k匿名的准标识符属性个性化实现算法(KAUP)。该算法能够有效根据用户的隐私要求,在同一个数据表上呈现不同的k值,从而满足个性化的k匿名。文章使用数据集Adult在运行时间、信息损失和可扩展性方面设计了对比实验。实验表明,在同一个数据表上进行个性化匿名是可行的,且匿名过程中的信息损失较小,利于准标识符属性的个性化匿名研究。

关键词: 个性化, k匿名, 隐私保护

Abstract:

k-anonymity can solve the problem of link attack in the field of privacy protection to a great extent, but the existing k-anonymity model does not attach importance to personal privacy autonomy. The existing improved k-anonymity model can not meet the needs of different people for different types of data. After the data table is published, the whole table still has only one k value, that is, all tuples are unified and generalized, which can not reflect the user's personalized privacy requirements, resulting in great information loss. Based on k-anonymity model, combined with the generalization idea based on clustering, this paper proposes a k-anonymity algorithm for personalized quasi-identifier attributes(KAUP). The algorithm can effectively present different k values on the same data table according to the privacy requirements of users, so as to meet the personalized k-anonymity. This paper designs comparative experiments of runtime, information loss and scalability on dataset Adult. Experiments show that personalized anonymity on the same data table is feasible, and the information loss in the anonymity process is small, which is conducive to the personalized anonymity research of quasi-identifier attributes.

Key words: personalization, k-anonymity, privacy protection

中图分类号: