信息网络安全 ›› 2023, Vol. 23 ›› Issue (4): 80-89.doi: 10.3969/j.issn.1671-1122.2023.04.009

• 技术研究 • 上一篇    下一篇

基于字符空间构造的域名匿名化算法

尹曙1,2, 陈兴蜀1,2(), 朱毅1,2, 曾雪梅1,2   

  1. 1.四川大学网络空间安全学院,成都 610065
    2.四川大学网络空间安全研究院,成都 610065
  • 收稿日期:2022-10-26 出版日期:2023-04-10 发布日期:2023-04-18
  • 通讯作者: 陈兴蜀 E-mail:chenxsh@scu.edu.cn
  • 作者简介:尹曙(1998—),女,四川,硕士研究生,主要研究方向为云计算与大数据安全|陈兴蜀(1968—),女,四川,教授,博士,主要研究方向为云计算、数据安全体系、威胁检测和开源情报分析|朱毅(1991—),男,四川,博士研究生,主要研究方向为网络行为与威胁识别|曾雪梅(1976—),女,四川,工程师,博士,主要研究方向为网络流量识别、网络行为分析和IPv6网络安全。
  • 基金资助:
    国家自然科学基金(U19A2081);国家自然科学基金(61802270);国家自然科学基金(61802271);中央高校基本科研业务费专项资金(SCU2021D048);四川大学工科特色团队项目(2020SCUNG129)

Anonymous Domain Name Algorithm Based on Character Space Construction

YIN Shu1,2, CHEN Xingshu1,2(), ZHU Yi1,2, ZENG Xuemei1,2   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
  • Received:2022-10-26 Online:2023-04-10 Published:2023-04-18
  • Contact: CHEN Xingshu E-mail:chenxsh@scu.edu.cn

摘要:

网络流量中包含的域名数据给网络流量共享带来数据隐私的挑战。现有对域名的匿名化处理方法多采用文本泛化和替换等手段,隐私性处理效果较好,但破坏了域名原有的结构和文本特性,无法满足网络安全分析场景的需求。文章提出一种面向网络安全分析的域名匿名化方法,通过基于域名结构的分层匿名处理策略和基于字符空间构造的匿名化算法,在保留网络安全分析过程中所关注的域名结构和文本属性特征的前提下对域名文本进行重构,实现既保留研究人员所需的域名数据可用性,又去除域名数据中的隐私信息的目的。为抵御穷举攻击,文章采取按参数随机重构的方式,以减少不同批次下相同域名匿名结果发生重复的概率,并基于校园网真实网络流量数据对提出的方法进行了验证。实验结果表明,文章提出的方法能够有效提升匿名化后域名数据的不可识别和不可逆的特性,并保留其在结构和语义方面的效用。

关键词: 匿名化, 域名数据, 隐私保护, 字符空间构造

Abstract:

Domain name data contained in network traffic brings data privacy challenges to network traffic sharing. The existing anonymization algorithms for domain names mostly use text generalization and replacement. Their privacy processing effect is good, but they destroy the original structure and text characteristics of domain names, and cannot meet the needs of network security analysis scenarios. This paper proposed a domain name anonymization method for network security analysis. Through the hierarchical anonymous processing strategy based on the domain name structure and the anonymous algorithm based on the character space construction, the domain name text is reconstructed on the premise of retaining the domain name structure and linguistic features concerned in the network security analysis, so as to maintain the availability of the domain name data required by researchers and remove the privacy information in the domain name data. In order to resist exhaustive attacks, the method of random reconstruction by parameters was adopted to reduce the probability of repeated anonymous results of the same domain name in different batches, and the proposed method based on the real network traffic data of campus network was verified. The experimental results show that the method proposed in this paper can effectively improve the unrecognized and irreversible characteristics of anonymous domain name data, and retain its structural and linguistic utility.

Key words: anonymization, domain name data, privacy protection, character space construction

中图分类号: