信息网络安全 ›› 2021, Vol. 21 ›› Issue (12): 118-125.doi: 10.3969/j.issn.1671-1122.2021.12.016
收稿日期:
2021-08-16
出版日期:
2021-12-10
发布日期:
2022-01-11
通讯作者:
芦天亮
E-mail:lutianliang@ppsuc.edu.cn
作者简介:
王曦锐(1998—),男,江苏,硕士研究生,主要研究方向为网络信息安全、网络攻防|芦天亮(1985—),男,河北,副教授,博士,主要研究方向为网络信息安全、恶意代码分析与检测|张建岭(1965—),男,河北,副教授,硕士,主要研究方向为计算机科学技术、人工智能|丁锰(1980—),男,北京,副教授,硕士,主要研究方向为电子物证检验
基金资助:
WANG Xirui, LU Tianliang(), ZHANG Jianling, DING Meng
Received:
2021-08-16
Online:
2021-12-10
Published:
2022-01-11
Contact:
LU Tianliang
E-mail:lutianliang@ppsuc.edu.cn
摘要:
Tor网络常被犯罪分子用来从事各类违法活动,因此对Tor流量进行高效识别对网络监管和打击犯罪有着重要意义。文章针对真实环境中Tor流量稀疏及识别准确率不高的问题,基于集成学习思想,提出一种加权Stacking模型的Tor流量识别方法。基于数据流层面提取流量的时间相关性特征,文章计算信息增益筛选最大的前14个特征构成输入数据集,对KNN、SVM和XGBoost进行不同的加权改进并作基学习器,XGBoost作为元学习器构建两层Stacking模型。在公开数据集上与10种其他算法对比,实验结果表明,文章提出的识别模型在准确率上优于大部分算法并且拥有较低的漏报率,更符合真实网络环境中Tor流量识别的要求。
中图分类号:
王曦锐, 芦天亮, 张建岭, 丁锰. 基于加权Stacking集成学习的Tor匿名流量识别方法[J]. 信息网络安全, 2021, 21(12): 118-125.
WANG Xirui, LU Tianliang, ZHANG Jianling, DING Meng. Tor Anonymous Traffic Identification Method Based on Weighted Stacking Ensemble Learning[J]. Netinfo Security, 2021, 21(12): 118-125.
表1
特征集合
特征 | 特征描述 | 重要性 |
---|---|---|
Bwd IAT Std | 反向间隔时间标准差 | 138.9 |
Bwd IAT Max | 反向间隔时间最大值 | 36.3 |
Flow Bytes/s | 流字节/s | 28.3 |
Flow Duration | 流持续时间 | 18.1 |
Bwd IAT Min | 反向间隔时间最小值 | 13.8 |
Flow IAT Min | 流间隔时间最小值 | 12.0 |
Flow IAT Std | 流间隔时间标准差 | 11.6 |
Flow IAT Max | 流间隔时间最大值 | 11.2 |
Fwd IAT Mean | 正向间隔时间均值 | 10.9 |
Bwd IAT Mean | 反向间隔时间均值 | 9.5 |
Fwd IAT Std | 正向间隔时间标准差 | 9.5 |
Fwd IAT Min | 正向间隔时间最小值 | 9.5 |
Flow Packets/s | 流包数/s | 6.8 |
Fwd IAT Max | 正向间隔时间最大值 | 6.0 |
表6
不同算法对比
Precision | Recall | F-measure | AUC | |
---|---|---|---|---|
SVM | 0.575 | 0.683 | 0.624 | 0.808 |
KNN | 0.843 | 0.867 | 0.855 | 0.923 |
NB | 0.349 | 0.934 | 0.508 | 0.851 |
MLP | 0.826 | 0.838 | 0.832 | 0.907 |
RF | 0.960 | 0.923 | 0.941 | 0.959 |
GBDT | 0.914 | 0.768 | 0.835 | 0.879 |
LightGBM | 0.956 | 0.951 | 0.953 | 0.972 |
XGBoost | 0.964 | 0.959 | 0.962 | 0.977 |
CNN[ | 0.982 | 0.884 | 0.930 | 0.940 |
SAE[ | 0.974 | 0.877 | 0.922 | 0.936 |
加权Stacking模型(本文方案) | 0.978 | 0.975 | 0.976 | 0.986 |
[1] | LASHKARI A H, DRAPER-GIL G, MAMUN M S I, et al. Characterization of Tor Traffic Using Time Based Features[C]// INSTICC. Proceedings of the 3rd International Conference on Information Systems Security and Privacy, February 19-21, 2017, Porto, Portugal. Lisbon: INSTICC, 2017: 253-262. |
[2] | YAO Zhongjiang, GE Jingguo, ZHANG Xiaodan, et al. Research Review on Traffic Obfuscation and Its Corresponding Identification and Tracking Technologies[J]. Journal of Software, 2018, 29(10):313-330. |
姚忠将, 葛敬国, 张潇丹, 等. 流量混淆技术及相应识别,追踪技术研究综述[J]. 软件学报, 2018, 29(10):313-330. | |
[3] | QI Yaxuan, XU Lianghong, YANG Baohua, et al. Packet Classification Algorithms: From Theory to Practice[C]// IEEE. INFOCOM 2009: International Conference on Computer Communications, April 19-25, 2009, Rio De Janeiro, Brazil. New York: IEEE, 2009: 648-656. |
[4] | YEGANEH S H, EFTEKHAR M, GANJALI Y, et al. Cute: Traffic Classification Using Terms[C]// IEEE. 2012 21st International Conference on Computer Communications and Networks (ICCCN 2012), July 30-August 2, 2012, Munich, Germany. New York: IEEE, 2012: 1-9. |
[5] |
HE Gaofeng, YANG Ming, LUO Junzhou, et al. Online Identification of Tor Anonymous Communication Traffic[J]. Journal of Software, 2013, 24(3):540-546.
doi: 10.3724/SP.J.1001.2013.04253 URL |
何高峰, 杨明, 罗军舟, 等. Tor匿名通信流量在线识别方法[J]. 软件学报, 2013, 24(3):540-556. | |
[6] | WANG Liang, DYER KP, AKELLA A, et al. Seeing Through Network-protocol Obfuscation[C]// ACM. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, October 12-16, 2015, Denver, CO, USA. New York: ACM, 2015: 57-69. |
[7] | TAN Qingfeng, SHI Jinqiao, FANG Binxing, et al. Towards Measuring Unobservability in Anonymous Communication Systems[J]. Journal of Computer Research and Development, 2015, 52(10):2373-2381. |
谭庆丰, 时金桥, 方滨兴, 等. 匿名通信系统不可观测性度量方法[J]. 计算机研究与发展, 2015, 52(10):2373-2381. | |
[8] | HU Bin, ZHOU Zhihong, YAO Lihong, et al. Malicious Traffic Detection Combining Features of Packet Payload and Stream Fingerprint[J]. Computer Engineering, 2020, 46(11):157-163. |
胡斌, 周志洪, 姚立红, 等. 结合报文负载与流指纹特征的恶意流量检测[J]. 计算机工程, 2020, 46(11):157-163. | |
[9] | CAI Zhenzhen, JIANG Bo, LU Zhigang, et al. IsAnon: Flow-based Anonymity Network Traffic Identification Using Extreme Gradient Boosting[C]// IEEE. 2019 International Joint Conference on Neural Networks (IJCNN 2019), July 14-19, 2019, Budapest, Hungary. New York: IEEE, 2019: 1-8. |
[10] | LIANG Di, HE Yongzhong. Obfs4 Traffic Identification Based on Multiple-feature Fusion[C]// IEEE. 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS 2020), July 28-30, 2020, Shenyang, China. New York: IEEE, 2020: 323-327. |
[11] |
LOTFOLLAHI M, SIAVOSHANI M J, ZADE R S H, et al. Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning[J]. Soft Computing, 2020, 24(3):1999-2012.
doi: 10.1007/s00500-019-04030-2 URL |
[12] | WANG Wei, ZHU Ming, ZENG Xuewen, et al. Malware Traffic Classification Using Convolutional Neural Network for Representation Learning[C]// IEEE. 2017 International Conference on Information Networking (ICOIN 2017), January 11-13, 2017, Da Nang, Vietnam. New York: IEEE, 2017: 712-717. |
[13] |
HWANG R H, PENG Min Chun, NGUYEN V L, et al. An LSTM-based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level[J]. Applied Sciences, 2019, 9(16):3414-3420.
doi: 10.3390/app9163414 URL |
[14] |
DONG Xibin, YU Zhiwen, CAO Wenming, et al. A Survey on Ensemble Learning[J]. Frontiers of Computer Science, 2020, 14(2):241-258.
doi: 10.1007/s11704-019-8208-z |
[15] | BREIMAN L. Bagging Predictors[J]. Machine learning, 1996, 24(2):123-140. |
[16] | SCHAPIRE R E. A Brief Introduction to Boosting[C]// IJCAI. Proceedings of the 16th International Joint Conference on Artificial Intelligence, July 31-August 6, 1999, Stockholm, Sweden. San Mateo: Morgan Kaufmann, 1999: 1401-1406. |
[17] | TING K M, WITTEN I H. Stacking Bagged and Dagged Models[EB/OL]. https://xueshu.baidu.com/usercenter/paper/show?paperid=bd4c4a7d1a38256b5e5099961f824232, 2021-06-21. |
[18] |
KUMAR G, THAKUR K, AYYAGARI M R. MLEsIDSs: Machine Learning-based Ensembles for Intrusion Detection Systems—A Review[J]. The Journal of Supercomputing, 2020, 64(11):1-34.
doi: 10.1007/s11227-012-0817-3 URL |
[19] | CHEN Tianqi, GUESTRIN C. XGBoost: A Scalable Tree Boosting System[C]// ACM. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 13-17, 2016, San Francisco, CA, USA. New York: ACM, 2016: 785-794. |
[20] | WANG Tengfei, CAI Manchun, YUE Ting, et al. Tor Anonymous Traffic Identification Based on Histogram-XGBoost[J]. Computer Engineering and Applications, 2021, 57(14):110-115. |
王腾飞, 蔡满春, 岳婷, 等. Histogram-XGBoost的Tor匿名流量识别[J]. 计算机工程与应用, 2021, 57(14):110-115. | |
[21] | KE Guolin, MENG Qi, FINLEY T, et al. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree[J]. Advances in Neural Information Processing Systems, 2017, 30(6):3146-3154. |
[1] | 李彦霖, 蔡满春, 芦天亮, 席荣康. 遗传算法优化CNN的网站指纹攻击方法[J]. 信息网络安全, 2021, 21(9): 59-66. |
[2] | 徐国天, 沈耀童. 基于XGBoost与Stacking融合模型的恶意程序多分类检测方法[J]. 信息网络安全, 2021, 21(6): 52-62. |
[3] | 蔡满春, 王腾飞, 岳婷, 芦天亮. 基于ARF的Tor网站指纹识别技术[J]. 信息网络安全, 2021, 21(4): 39-48. |
[4] | 张晓宇, 王华忠. 基于改进Border-SMOTE的不平衡数据工业控制系统入侵检测[J]. 信息网络安全, 2020, 20(7): 70-76. |
[5] | 吕宗平, 赵春迪, 顾兆军, 周景贤. 基于Stacking模型融合的勒索软件动态检测算法[J]. 信息网络安全, 2020, 20(2): 57-57. |
[6] | 何泾沙, 韩松, 朱娜斐, 葛加可. 基于改进V-detector算法的入侵检测研究与优化[J]. 信息网络安全, 2020, 20(12): 19-27. |
[7] | 文奕, 陈兴蜀, 曾雪梅, 罗永刚. 面向安全分析的大规模网络下的DNS流量还原系统[J]. 信息网络安全, 2019, 19(5): 77-83. |
[8] | 刘延华, 高晓玲, 朱敏琛, 苏培煌. 基于数据特征学习的网络安全数据分类方法研究[J]. 信息网络安全, 2019, 19(10): 50-56. |
[9] | 邓凯, 田志宏, 马丹阳. 一种基于wirehair码的高可靠分布式存储方案的研究与实现[J]. 信息网络安全, 2018, 18(2): 20-26. |
[10] | 裘玥. 匿名网络的安全监管隐患与信息获取技术研究[J]. 信息网络安全, 2015, 15(9): 106-108. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||