信息网络安全 ›› 2024, Vol. 24 ›› Issue (1): 60-68.doi: 10.3969/j.issn.1671-1122.2024.01.006

• 隐私保护 • 上一篇    下一篇

基于联邦学习的Tor流量检测算法设计与实现

赵佳1,2, 杨博凯1,2(), 饶欣宇1,2, 郭雅婷1,2   

  1. 1.北京交通大学智能交通数据安全与隐私保护实验室,北京 100044
    2.北京交通大学计算机与信息技术学院,北京 100044
  • 收稿日期:2023-11-14 出版日期:2024-01-10 发布日期:2024-01-24
  • 通讯作者: 杨博凯 E-mail:23120488@bjtu.edu.cn
  • 作者简介:赵佳(1980—),女,内蒙古,副教授,博士,CCF会员,主要研究方向为密码学、隐私保护|杨博凯(2000—),男,黑龙江,硕士研究生,主要研究方向为联邦学习|饶欣宇(1999—),女,四川,硕士研究生,主要研究方向为联邦学习、密码学|郭雅婷(1998—),女,山东,硕士研究生,主要研究方向为联邦学习、密码学
  • 基金资助:
    国家重点研发计划(2020YFB2103800)

Design and Implementation of Tor Traffic Detection Algorithm Based on Federated Learning

ZHAO Jia1,2, YANG Bokai1,2(), RAO Xinyu1,2, GUO Yating1,2   

  1. 1. Intelligent Traffic Data Security and Privacy Protection Laboratory, Beijing Jiaotong University, Beijing 100044, China
    2. School of Computer Science and Information Technology, Beijing Jiaotong University, Beijing 100044, China
  • Received:2023-11-14 Online:2024-01-10 Published:2024-01-24
  • Contact: YANG Bokai E-mail:23120488@bjtu.edu.cn

摘要:

Tor网络作为第二代匿名互联网通信系统,常被网络罪犯用于进行网络攻击和欺诈等恶意活动,给网络安全带来了严重的威胁和挑战。为解决该问题,文章提出一种基于联邦学习的Tor流量检测方法。目前Tor流量检测以单主机检测为主,存在效率低和无法实现数据共享的问题,文章采用联邦学习技术和DP-SGD算法确保各参与方在保护用户隐私的前提下构建全局模型,解决数据孤岛问题。实验证明,该模型在保障用户数据隐私的同时,具有92%的整体准确率、90%的准确率和92%的召回率。文章通过对比实验进一步验证了模型在隐私保护和分类效果上的优越性。

关键词: 联邦学习, Tor流量, 检测系统, DP-SGD, 数据隐私

Abstract:

The Tor network, a second-gen anonymous internet communication system, has often been exploited by cybercriminals for malicious activities like network attacks and fraud, creating cybersecurity threats and challenges. In response, this paper presented a Tor traffic detection method using federated learning. Current Tor traffic detection mainly relies on single-host detection, resulting in low efficiency and data-sharing challenges. By utilizing federated learning technology and the DP-SGD algorithm, this paper empowers participants to construct a global model while safeguarding user privacy, addressing data isolation. Experimental results show the model achieves 92% overall accuracy, 90% precision, and 92% recall, ensuring user data privacy. Comparative experiments further confirm the model’s superiority in privacy protection and classification effectiveness.

Key words: federated learning, Tor traffic, detection systems, DP-SGD, data privacy

中图分类号: