基于深度学习的HTTP负载隐蔽信道检测方法

doi:10.3969/j.issn.1671-1122.2023.07.006

摘要/Abstract

摘要：

针对现有的网络流量统计特征和网络数据包负载特征无法有效检测HTTP负载隐蔽信道的问题，文章提出了一种基于会话流负载表示方式的卷积神经网络检测方法。首先，根据五元组和过期时间条件将HTTP通信产生的数据包聚合为双向会话流；然后，选择能反映通信交互行为和会话流结构的一组数据包，提取其传输层载荷原始字节序列，形成表示每一条HTTP会话流的会话流负载；最后，采用能够充分挖掘字节序列中时间与空间维度信息的2D-CNN构建检测模型。实验结果表明，提出的会话流负载表示方法相较于会话流数据包负载表示方法可以从更多的角度刻画HTTP通信，从而为检测任务提供更多有用信息。所提方法的检测准确率高达99%，效果优于基于网络流行为统计特征的传统机器学习检测方法。

关键词: HTTP, 隐蔽信道, 卷积神经网络, 检测任务

Abstract:

Aiming at the problem that existing network traffic statistical features and packet payload features cannot effectively detect HTTP payload covert channels, this article proposed a convolutional neural network detection method based on session flow payload representation. First, packets generated by HTTP communication were aggregated into bidirectional session flows based on five-tuple and expiration time conditions. Then, selected a set of packets that can reflect the communication interaction behavior and session flow structure, extract the original byte sequence of their transport layer payload, forming a session flow payload representing each HTTP session flow. Finally, the detection model was constructed using 2D-CNN that can fully mine temporal and spatial dimensional information in byte sequences. Experimental results show that the proposed session flow payload representation method can depict HTTP traffic from more perspectives than the session flow packet payload representation method, thereby providing more useful information for the detection task. The detection rate of the proposed method is as high as 99%, which is better than traditional machine learning detection methods based on network flow behavior statistical features.

Key words: HTTP, covert channel, convolutional neural network, detection task

中图分类号:

TP309

苑文昕, 陈兴蜀, 朱毅, 曾雪梅. 基于深度学习的HTTP负载隐蔽信道检测方法[J]. 信息网络安全, 2023, 23(7): 53-63.

YUAN Wenxin, CHEN Xingshu, ZHU Yi, ZENG Xuemei. HTTP Payload Covert Channel Detection Method Based on Deep Learning[J]. Netinfo Security, 2023, 23(7): 53-63.

图/表 15

图1

图2

图3

图4

图5

表1

图6

图7

图8

图9

图10

表2

表3

表4

表5

参考文献 26

[1]	LAMPSON BW. A Note on the Confinement Problem[J]. Communications of the ACM, 1973, 16(10): 613-615. doi: 10.1145/362375.362389 URL
[2]	MITRE ATT&CK. Application Layer Protocol: Web Protocols[EB/OL]. (2020-03-26) [2023-03-20]. https://attack.mitre.org/techniques/T1071/001/.
[3]	LIU Fang, LI Dongdong, ZHAO Yuntao, et al. The Covert Communication Detection Model Based on Key Field of Header in HTTP Protocol[J]. Fire Control & Command Control, 2018, 43(11): 38-43.
	刘芳, 李东东, 赵运弢, 等. HTTP 协议报文头域关键字段的隐蔽通信检测模型[J]. 火力与指挥控制, 2018, 43(11): 38-43.
[4]	SHEN Guoliang, ZHAI Jiangtao, DAI Yuewei. HTTP Parameter Sorting Covert Channel Detection Method Based on Markov Model[J]. Computer Engineering, 2020, 46(2): 154-158, 169. doi: 10.19678/j.issn.1000-3428.0053783
	沈国良, 翟江涛, 戴跃伟. 基于Markov模型的HTTP参数排序隐蔽信道检测方法[J]. 计算机工程, 2020, 46(2): 154-158,169. doi: 10.19678/j.issn.1000-3428.0053783
[5]	WU Jiahong, YANG Zhenguo, LIU Wenyin. Multiscale Feature Fusion for Malicious HTTP Request Detection[J]. Application Research of Computers, 2021, 38(3): 871-874+880.
	巫家宏, 杨振国, 刘文印. 基于多尺度特征融合的恶意HTTP请求检测方法[J]. 计算机应用研究, 2021, 38(3):871-874,880.
[6]	DARWISH O, Al-FUQAHA A, BRAHIM G B, et al. Using Hierarchical Statistical Analysis and Deep Neural Networks to Detect Covert Timing Channels[EB/OL]. (2019-09-20) [2023-03-20]. https://doi.org/10.1016/j.asoc.2019.105546.
[7]	Al-EIDI S, DARWISH O, CHEN Yuanzhu. Covert Timing Channel Analysis Either as Cyber Attacks or Confidential Applications[J]. Sensors, 2020, 20(8): 2417-2431. doi: 10.3390/s20082417 URL
[8]	AL-EIDI S, DARWISH O, CHEN Yuanzhu, et al. SnapCatch: Automatic Detection of Covert Timing Channels Using Image Processing and Machine Learning[J]. IEEE Access, 2020, 9: 177-191. doi: 10.1109/Access.6287639 URL
[9]	WANG Yifei, YANG Yalei, RAO Mengliang. Research of HTTP Tunnel Detecting Technique Based on C4.5[J]. Computer Engineering and Design, 2012, 33(2): 493-497.
	王宜菲, 杨亚磊, 饶孟良. 基于C4.5的HTTP隧道检测技术研究[J]. 计算机工程与设计, 2012, 33(2):493-497.
[10]	LI Wei, LI Lihui, LI Jia, et al. Characteristics Analysis of Traffic Behavior of Remote Access Trojan in Three Communication Phases[J]. Netinfo Security, 2015, 15(5): 10-15.
	李巍, 李丽辉, 李佳, 等. 远控型木马通信三阶段流量行为特征分析[J]. 信息网络安全, 2015, 15(5): 10-15.
[11]	CHEN Xingshu, CHEN Jinghan, SHAO Guolin, et al. A Covert Communication Behavior Detection Method Based on Session Flow Aggregation[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(3): 388-396.
	陈兴蜀, 陈敬涵, 邵国林, 等. 基于会话流聚合的隐蔽性通信行为检测方法[J]. 电子科技大学学报, 2019, 48(3): 388-396.
[12]	WANG Wei, ZHU Ming, ZENG Xuewen, et al. Malware Traffic Classification Using Convolutional Neural Network for Representation Learning[C]// IEEE. 2017 International Conference on Information Networking (ICOIN). New York: IEEE, 2017: 712-717.
[13]	LIN S Z, SHI Yong, XUE Zhi. Character-Level Intrusion Detection Based On Convolutional Neural Networks[C]// IEEE. 2018 International Joint Conference on Neural Networks (IJCNN). New York: IEEE, 2018: 1-8.
[14]	MARÍN G, CAASAS P, CAPDEHOURAT G. Deepmal-Deep Learning Models for Malware Traffic Detection and Classification[C]// Springer. Data Science-Analytics and Applications:Proceedings of the 3rd International Data Science Conference-IDSC2020. Berlin:Springer, 2021: 105-112.
[15]	WANG Shanshan, YAN Qiben, CHEN Zhenxiang, et al. Detecting Android Malware Leveraging Text Semantics of Network Flows[J]. IEEE Transactions on Information Forensics and Security, 2017, 13(5): 1096-1109. doi: 10.1109/TIFS.2017.2771228 URL
[16]	NIU Weina, XIE Jiao, ZHANG Xiaosong, et al. HTTP-Based APT Malware Infection Detection Using URL Correlation Analysis[J]. Security and Communication Networks, 2021, 21: 1-12.
[17]	YUN Xiaochun, XIE Jiang, LI Shuhao, et al. Detecting Unknown HTTP-Based Malicious Communication Behavior via Generated Adversarial Flows and Hierarchical Traffic Features[EB/OL]. (2022-07-16) [2023-03-20]. https://doi.org/10.1016/j.cose.2022.102834.
[18]	FIELDING R, RESCHKE J. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing[R]. New York: Internet Engineering Task Force (IETF), ISSN: 2070-1721, 2014.
[19]	KASUYA M. Threat Spotlight: Amadey Bot Targets Non-Russian Users[EB/OL]. (2020-01-08) [2023-03-20]. https://blogs.blackberry.com/en/2020/01/threat-spotlight-amadey-bot.
[20]	DUNCAN B. Evolution of Valak, from Its Beginnings to Mass Distribution[EB/OL]. (2020-07-24) [2023-03-20]. https://unit42. paloaltonetworks.com/valak-evolution/.
[21]	SHANNON C E. A Mathematical Theory of Communication[J]. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5(1): 3-55.
[22]	SUN Zhongjun, ZHAI Jiangtao, DAI Yuewei. An Encrypted Traffic Identification Method Based on DPI and Load Randomness[J]. Journal of Applied Sciences, 2019, 37(5): 711-720.
	孙中军, 翟江涛, 戴跃伟. 一种基于DPI和负载随机性的加密流量识别方法[J]. 应用科学学报, 2019, 37(5): 711-720.
[23]	AOUINI Z, PEKAR A. NFStream: A Flexible Network Data Analysis Framework[EB/OL]. (2022-02-26) [2023-03-20]. https://doi.org/10.1016/j.comnet.2021.108719.
[24]	SHIRAVI A, SHIRAVI H, TAVALLAEE M, et al. Toward Developing a Systematic Approach to Generate Benchmark Datasets for Intrusion Detection[J]. Computers & Security, 2012, 31(3): 357-374. doi: 10.1016/j.cose.2011.12.012 URL
[25]	WRAD D. Malware-Traffic-Analysis.net[EB/OL]. [2023-03-20]. https://malware-traffic-analysis.net/.
[26]	MONTAZERISHATOORI M, DAVIDSON L, KAUR G, et al. Detection of DoH Tunnels Using Time-Series Classification of Encrypted Traffic[C]// IEEE. IEEE International Symposium on Dependable, Autonomic and Secure Computing (DASC). New York: IEEE, 2020: 17-22.

编辑推荐 0

Metrics

阅读次数

全文

170

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	31	0	0	139

来源	本网站	其他网站

次数	170	0
比例	100%	0%

摘要

442

最新录用	在线预览	正式出版

0	0	442

	来源	本网站

	次数	442
	比例	100%

数据类型	数据来源	会话流数 /个	总会话流数 /个
正常	ISCXIDS2012数据集	5000	9423
正常	校园网流量	4423	9423
异常	Malware-traffic-analysis.net	5901	5901

方法	Accuracy	Precision	Recall	F1-score
流负载	99.752%	99.786%	99.786%	99.786%
包负载	89.556%	85.831%	94.322%	89.877%

算法模型	Accuracy	Precision	Recall	F1
2D-CNN	99.752%	99.786%	99.786%	99.786%
1D-CNN	97.321%	97.343%	96.948%	97.138%
GRU	97.170%	97.014%	96.961%	96.987%
LSTM	96.694%	96.423%	96.557%	96.489%
DNN	94.793%	94.727%	94.176%	94.437%

序号	特征
1	流内发送字节总数
2	流内发送字节占比
3	流内接收字节总数
4	流内接收字节占比
5~12	包长度（平均值、中位数、众数、方差、标准差、变异系数、平均值偏差、众数偏差）
13~20	包间隔时间（平均值、中位数、众数、方差、标准差、变异系数、平均值偏差、众数偏差）
21~28	请求时间/响应时间（平均值、中位数、众数、方差、标准差、变异系数、平均值偏差、众数偏差）

方法	Accuracy	Precision	Recall	F1
2D-CNN	99.752%	99.786%	99.786%	99.786%
RF	98.597%	98.499%	98.730%	98.614%
DT	98.263%	98.352%	98.213%	98.282%
KNN	96.626%	96.585%	93.045%	94.782%
SVM	94.832%	92.592%	91.635%	92.111%
NB	71.777%	54.385%	88.580%	67.393%