抗噪的应用层二进制协议格式逆向方法

doi:10.3969/j.issn.1671-1122.2021.07.009

信息网络安全 ›› 2021, Vol. 21 ›› Issue (7): 72-79.doi: 10.3969/j.issn.1671-1122.2021.07.009

抗噪的应用层二进制协议格式逆向方法

方敏之¹^,²(), 程光¹^,², 孔攀宇¹^,²

1.东南大学网络空间安全学院,南京 211189
2.东南大学网络空间国际治理研究基地,南京 211189

收稿日期:2021-02-04 出版日期:2021-07-10 发布日期:2021-07-23
通讯作者: 方敏之 E-mail:mzfang@njnet.edu.cn
作者简介:方敏之（1996—）,男,江苏,硕士研究生,主要研究方向为协议逆向分析|程光（1972—）,男,安徽,教授,博士,主要研究方向为加密流量|孔攀宇（1996—）,男,重庆,硕士研究生,主要研究方向为加密流量分析
基金资助:
国家重点研发计划(2018YFB1800602)

Anti-noise Application Layer Binary Protocol Format Reverse Method

FANG Minzhi¹^,²(), CHENG Guang¹^,², KONG Panyu¹^,²

1. School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
2. International Governance Research Base of Cyberspace, Southeast University, Nanjing 211189, China

Received:2021-02-04 Online:2021-07-10 Published:2021-07-23
Contact: FANG Minzhi E-mail:mzfang@njnet.edu.cn

摘要/Abstract

摘要：

现有的基于网络流量的二进制协议格式逆向方法通过比对多个相同类型的报文来推导协议格式,但报文集中的噪声报文会导致协议格式识别准确率较低,为此文章提出一种自动化去除噪声并推断协议格式的方法。该方法首先挖掘报文序列每个位置上的频繁项,识别出报文集中的特殊标识（FD）;然后根据每个位置上FD的频率之和有效去除噪声报文;接着根据报文头部的FD进行递归式的去噪与报文分割;再在通过报文分割得到的报文集合中进行k-means聚类,并用轮廓系数自动化确定聚类数k,获得各单一协议格式报文子集;最后在各报文子集中使用渐进多序列比对算法获得协议格式。实验结果表明,文章方法可以有效去除真实环境流量中的混杂噪声报文,有效提取协议格式中的关键词,从而推断出协议格式。

关键词: 二进制协议逆向, 特殊标识, 递归聚类, 序列比对, 频繁项挖掘

Abstract:

The existing binary protocol format reverse methods based on network traffic deduce the protocol format by comparing multiple messages of the same type, but the noise messages in the message set will lead to low accuracy of protocol format recognition. This paper proposes a method of automatically removing the noise and deducing the protocol format. Firstly, the method mines the frequent items at each position of message sequence, identifies the special identification (FD) in the message set, and effectively removes the noise messages according to the sum of the frequency of FD at each position. Then the method performs recursive denoising and message segmentation according to the FD of the message header, performs k-means clustering in the message set obtained by message segmentation, and automatically determines the clustering number k by the contour coefficient to obtain the message subset of each single protocol format. Finally, the protocol format is obtained by using progressive multiple sequence alignment algorithm in each message subset. The experimental results show that the proposed method can effectively remove the mixed noise messages in the real environment traffic, effectively extract the key words in the protocol format, and deduce the protocol format.

Key words: binary protocol reverse, special identification, recursive clustering, sequence alignment, frequent item mining

中图分类号:

TP309

方敏之, 程光, 孔攀宇. 抗噪的应用层二进制协议格式逆向方法[J]. 信息网络安全, 2021, 21(7): 72-79.

FANG Minzhi, CHENG Guang, KONG Panyu. Anti-noise Application Layer Binary Protocol Format Reverse Method[J]. Netinfo Security, 2021, 21(7): 72-79.

图/表 9

图1

图2

表1

n-gram模型基本元素-位置矩阵示例（部分）

value p	001	010	110
1	${{C}_{\left( 001,1 \right)}}$	${{C}_{\left( 010,1 \right)}}$	${{C}_{\left( 110,1 \right)}}$
2	${{C}_{\left( 001,2 \right)}}$	${{C}_{\left( 010,2 \right)}}$	${{C}_{\left( 110,2 \right)}}$
3	${{C}_{\left( 001,3 \right)}}$	${{C}_{\left( 010,3 \right)}}$	${{C}_{\left( 110,3 \right)}}$

表1

图3

表2

表3

表4

表5

图4

参考文献 15

[1]	Wireshark. Network Protocol Analyzer[EB/OL]. http://www.wireshark.org, 2020-08-15. 2020-08-15
[2]	CABALLERO J, YIN Heng, LIANG Zhenkai, et al. Polyglot: Automatic Extraction of Protocol Message Format Using Dynamic Binary Analysis[C]// ACM. The 14th ACM Conference on Computer and Communications Security, October 29-November 2, 2007, Alexandria, Virginia, USA. New York: ACM, 2007: 31-329.
[3]	LIN Zhiqiang, JIANG Xuxian, XU Dongyan, et al. Automatic Protocol Format Reverse Engineering through Context-aware Monitored Execution[C]// NDSS. The 15th Network and Distributed System Security Symposium, February 10-13, 2008, San Diego, California, USA. Reston: Internet Society, 2008: 1-15.
[4]	PAN Fan, WU Lifa, DU Youxiang, et al. Overviews on Protocol Reverse Engineering[J]. Application Research of Computers, 2011, 28(8):2801-2806.
	潘璠, 吴礼发, 杜有翔, 等. 协议逆向工程研究进展[J]. 计算机应用研究, 2011, 28(8):2801-2806.
[5]	LI Min, YU Shunzheng. Anti-noise Optimal Segmentation Method for Unknown Application Layer Protocol Message Format[J]. Journal of Software, 2013, 24(3):604-617. doi: 10.3724/SP.J.1001.2013.04243 URL
	黎敏, 余顺争. 抗噪的未知应用层协议报文格式最佳分段方法[J]. 软件学报, 2013, 24(3):604-617.
[6]	BEDDOE M A. Network Protocol Analysis Using Bioinformatics Algorithms[EB/OL]. https://www.researchgate.net/publication/228531955_Network_protocol_analysis_using_bioinformatics_algorithms, 2020-08-15.
[7]	LI Weiming, ZHANG Aifang, LIU Jiancai, et al. Automatic Fuzzy Test Vulnerability Mining Method for Network Protocol[J]. Chinese Journal of Computers, 2011, 34(2):242-255. doi: 10.3724/SP.J.1016.2011.00242 URL
	李伟明, 张爱芳, 刘建财, 等. 网络协议的自动化模糊测试漏洞挖掘方法[J]. 计算机学报, 2011, 34(2):242-255.
[8]	TAO Y, YU Hongyi, LI Qing. Bit-oriented Format Extraction Approach for Automatic Binary Protocol Reverse Engineering[J]. IET Communications, 2016, 10(6):709-716. doi: 10.1049/cmu2.v10.6 URL
[9]	CUI Weidong, KANNAN J, WANG H J. Discover: Automatic Protocol Reverse Engineering from Network Traces[C]// USENIX. The 16th Usenix Security Symposium, August 6-10, 2007, Boston, MA, USA. New York: Usenix Association, 2007: 1-17.
[10]	LUO Jianzhen, YU Shunzheng. Position-based Automatic Reverse Engineering of Network Protocols[J]. Journal of Network & Computer Applications, 2013, 36(3):1070-1077.
[11]	HEI Xinhong, BAI Binbin, WANG Yichuan, et al. Feature Extraction Optimization for Bitstream Communication Protocol Format Reverse Analysis[C]// IEEE. 2019 18th IEEE International Conference on Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), August 5-8, 2019, Rotorua, New Zealand. NJ: IEEE, 2019: 662-669.
[12]	SHIM K S, GOO Y H, LEE M S, et al. Clustering Method in Protocol Reverse Engineering for Industrial Protocols[J]. International Journal of Network Management, 2020, 30(6):1-15.
[13]	LI Wentian. Random Texts Exhibit Zipf’s-law-like Word Frequency Distribution[J]. IEEE Transactions on Information Theory, 1992, 38(6):1842-1845. doi: 10.1109/18.165464 URL
[14]	ZHOU Aiwu, YU Yafei. The Research about Clustering Algorithm of K-Means[J]. Computer Technology And Development, 2011, 21(2):62-65.
	周爱武, 于亚飞. K-Means聚类算法的研究[J]. 计算机技术与发展, 2011, 21(2):62-65.
[15]	ZHU Lianjiang, MA Bingxian, ZHAO Xuequan. Clustering Validity Analysis Based on Contour Coefficient[J]. Journal of Computer Applications, 2010, 30(12):139-141.
	朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 30(12):139-141.

编辑推荐 0

Metrics

阅读次数

全文

142

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	9	0	0	133

来源	本网站	其他网站

次数	137	5
比例	96%	4%

摘要

404

最新录用	在线预览	正式出版

0	0	404

	来源	本网站

	次数	404
	比例	100%

协议	会话数/个	协议格式类型数/个	报文个数/个
SMB	2290	8	21523
TLS	2000	2	4000

（threshold1,threshold2）	TNR	FPR
（0.05,0.95）	0	0.86
（0.1,0.9）	0	0
（0.2,0.8）	0.23	0

噪声类型目标协议	单一协议报文		混杂报文
噪声类型目标协议	TNR	FPR	TNR	FPR
SMB	0	0	0	0
TLS	0	0	0	0

噪声类型目标协议	单一协议报文		混杂报文
噪声类型目标协议	TNR	FPR	TNR	FPR
SMB	0	0.8	0	0
TLS	0	0.9	0	0

抗噪的应用层二进制协议格式逆向方法

Anti-noise Application Layer Binary Protocol Format Reverse Method

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 15

相关文章 15

编辑推荐 0

Metrics

本文评价

[1]	谢四江, 高琼, 冯雁. 基于可信中继量子密钥分发网络的最少公共节点多路径路由方案[J]. 信息网络安全, 2021, 21(7): 35-42.
[2]	刘忻, 杨浩睿, 郭振斌, 王家寅. 一种实现在线注册与权限分离的工业物联网身份认证协议[J]. 信息网络安全, 2021, 21(7): 1-9.
[3]	胡博文, 周纯杰, 刘璐. 基于模糊多目标决策的智能仪表功能安全与信息安全融合方法[J]. 信息网络安全, 2021, 21(7): 10-16.
[4]	文伟平, 方莹, 叶何, 陈夏润. 一种对抗符号执行的代码混淆系统[J]. 信息网络安全, 2021, 21(7): 17-26.
[5]	任涛, 金若辰, 罗咏梅. 融合区块链与联邦学习的网络入侵检测算法[J]. 信息网络安全, 2021, 21(7): 27-34.
[6]	郭春, 蔡文艳, 申国伟, 周雪梅. 基于关键载荷截取的SQL注入攻击检测方法[J]. 信息网络安全, 2021, 21(7): 43-53.
[7]	徐洪平, 马泽文, 易航, 张龙飞. 基于卷积循环神经网络的网络流量异常检测技术[J]. 信息网络安全, 2021, 21(7): 54-62.
[8]	赵彧然, 孟魁. 基于句子分组的中英机器翻译研究[J]. 信息网络安全, 2021, 21(7): 63-71.
[9]	陈柏沩, 夏璇, 钟卫东, 吴立强. 基于秘密共享的LBlock的S盒防御方案[J]. 信息网络安全, 2021, 21(7): 80-86.
[10]	黄子依, 秦玉海. 基于多特征识别的恶意挖矿网页检测及其取证研究[J]. 信息网络安全, 2021, 21(7): 87-94.
[11]	刘忻, 郭振斌, 宋宇宸. 一种基于SGX的工业物联网身份认证协议[J]. 信息网络安全, 2021, 21(6): 1-10.
[12]	张正, 柳亚男, 王雷, 方旭明. 针对不规则网络的高精度和高效率的多跳定位算法[J]. 信息网络安全, 2021, 21(6): 11-18.
[13]	沈卓炜, 高鹏, 许心宇. 基于安全协商的DDS安全通信中间件设计[J]. 信息网络安全, 2021, 21(6): 19-25.
[14]	刘璟, 张玉臣, 张红旗. 基于Q-Learning的自动入侵响应决策方法[J]. 信息网络安全, 2021, 21(6): 26-35.
[15]	吴奕, 仲盛. 区块链共识算法Raft研究[J]. 信息网络安全, 2021, 21(6): 36-44.