Netinfo Security ›› 2023, Vol. 23 ›› Issue (4): 80-89.doi: 10.3969/j.issn.1671-1122.2023.04.009

Previous Articles     Next Articles

Anonymous Domain Name Algorithm Based on Character Space Construction

YIN Shu1,2, CHEN Xingshu1,2(), ZHU Yi1,2, ZENG Xuemei1,2   

  1. 1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2. Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
  • Received:2022-10-26 Online:2023-04-10 Published:2023-04-18
  • Contact: CHEN Xingshu E-mail:chenxsh@scu.edu.cn

Abstract:

Domain name data contained in network traffic brings data privacy challenges to network traffic sharing. The existing anonymization algorithms for domain names mostly use text generalization and replacement. Their privacy processing effect is good, but they destroy the original structure and text characteristics of domain names, and cannot meet the needs of network security analysis scenarios. This paper proposed a domain name anonymization method for network security analysis. Through the hierarchical anonymous processing strategy based on the domain name structure and the anonymous algorithm based on the character space construction, the domain name text is reconstructed on the premise of retaining the domain name structure and linguistic features concerned in the network security analysis, so as to maintain the availability of the domain name data required by researchers and remove the privacy information in the domain name data. In order to resist exhaustive attacks, the method of random reconstruction by parameters was adopted to reduce the probability of repeated anonymous results of the same domain name in different batches, and the proposed method based on the real network traffic data of campus network was verified. The experimental results show that the method proposed in this paper can effectively improve the unrecognized and irreversible characteristics of anonymous domain name data, and retain its structural and linguistic utility.

Key words: anonymization, domain name data, privacy protection, character space construction

CLC Number: