Anti-noise Application Layer Binary Protocol Format Reverse Method

doi:10.3969/j.issn.1671-1122.2021.07.009

Abstract

Abstract:

The existing binary protocol format reverse methods based on network traffic deduce the protocol format by comparing multiple messages of the same type, but the noise messages in the message set will lead to low accuracy of protocol format recognition. This paper proposes a method of automatically removing the noise and deducing the protocol format. Firstly, the method mines the frequent items at each position of message sequence, identifies the special identification (FD) in the message set, and effectively removes the noise messages according to the sum of the frequency of FD at each position. Then the method performs recursive denoising and message segmentation according to the FD of the message header, performs k-means clustering in the message set obtained by message segmentation, and automatically determines the clustering number k by the contour coefficient to obtain the message subset of each single protocol format. Finally, the protocol format is obtained by using progressive multiple sequence alignment algorithm in each message subset. The experimental results show that the proposed method can effectively remove the mixed noise messages in the real environment traffic, effectively extract the key words in the protocol format, and deduce the protocol format.

Key words: binary protocol reverse, special identification, recursive clustering, sequence alignment, frequent item mining

CLC Number:

TP309

FANG Minzhi, CHENG Guang, KONG Panyu. Anti-noise Application Layer Binary Protocol Format Reverse Method[J]. Netinfo Security, 2021, 21(7): 72-79.

Figures/Tables 9

References 15

[1]	Wireshark. Network Protocol Analyzer[EB/OL]. http://www.wireshark.org, 2020-08-15. 2020-08-15
[2]	CABALLERO J, YIN Heng, LIANG Zhenkai, et al. Polyglot: Automatic Extraction of Protocol Message Format Using Dynamic Binary Analysis[C]// ACM. The 14th ACM Conference on Computer and Communications Security, October 29-November 2, 2007, Alexandria, Virginia, USA. New York: ACM, 2007: 31-329.
[3]	LIN Zhiqiang, JIANG Xuxian, XU Dongyan, et al. Automatic Protocol Format Reverse Engineering through Context-aware Monitored Execution[C]// NDSS. The 15th Network and Distributed System Security Symposium, February 10-13, 2008, San Diego, California, USA. Reston: Internet Society, 2008: 1-15.
[4]	PAN Fan, WU Lifa, DU Youxiang, et al. Overviews on Protocol Reverse Engineering[J]. Application Research of Computers, 2011, 28(8):2801-2806.
	潘璠, 吴礼发, 杜有翔, 等. 协议逆向工程研究进展[J]. 计算机应用研究, 2011, 28(8):2801-2806.
[5]	LI Min, YU Shunzheng. Anti-noise Optimal Segmentation Method for Unknown Application Layer Protocol Message Format[J]. Journal of Software, 2013, 24(3):604-617. doi: 10.3724/SP.J.1001.2013.04243 URL
	黎敏, 余顺争. 抗噪的未知应用层协议报文格式最佳分段方法[J]. 软件学报, 2013, 24(3):604-617.
[6]	BEDDOE M A. Network Protocol Analysis Using Bioinformatics Algorithms[EB/OL]. https://www.researchgate.net/publication/228531955_Network_protocol_analysis_using_bioinformatics_algorithms, 2020-08-15.
[7]	LI Weiming, ZHANG Aifang, LIU Jiancai, et al. Automatic Fuzzy Test Vulnerability Mining Method for Network Protocol[J]. Chinese Journal of Computers, 2011, 34(2):242-255. doi: 10.3724/SP.J.1016.2011.00242 URL
	李伟明, 张爱芳, 刘建财, 等. 网络协议的自动化模糊测试漏洞挖掘方法[J]. 计算机学报, 2011, 34(2):242-255.
[8]	TAO Y, YU Hongyi, LI Qing. Bit-oriented Format Extraction Approach for Automatic Binary Protocol Reverse Engineering[J]. IET Communications, 2016, 10(6):709-716. doi: 10.1049/cmu2.v10.6 URL
[9]	CUI Weidong, KANNAN J, WANG H J. Discover: Automatic Protocol Reverse Engineering from Network Traces[C]// USENIX. The 16th Usenix Security Symposium, August 6-10, 2007, Boston, MA, USA. New York: Usenix Association, 2007: 1-17.
[10]	LUO Jianzhen, YU Shunzheng. Position-based Automatic Reverse Engineering of Network Protocols[J]. Journal of Network & Computer Applications, 2013, 36(3):1070-1077.
[11]	HEI Xinhong, BAI Binbin, WANG Yichuan, et al. Feature Extraction Optimization for Bitstream Communication Protocol Format Reverse Analysis[C]// IEEE. 2019 18th IEEE International Conference on Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), August 5-8, 2019, Rotorua, New Zealand. NJ: IEEE, 2019: 662-669.
[12]	SHIM K S, GOO Y H, LEE M S, et al. Clustering Method in Protocol Reverse Engineering for Industrial Protocols[J]. International Journal of Network Management, 2020, 30(6):1-15.
[13]	LI Wentian. Random Texts Exhibit Zipf’s-law-like Word Frequency Distribution[J]. IEEE Transactions on Information Theory, 1992, 38(6):1842-1845. doi: 10.1109/18.165464 URL
[14]	ZHOU Aiwu, YU Yafei. The Research about Clustering Algorithm of K-Means[J]. Computer Technology And Development, 2011, 21(2):62-65.
	周爱武, 于亚飞. K-Means聚类算法的研究[J]. 计算机技术与发展, 2011, 21(2):62-65.
[15]	ZHU Lianjiang, MA Bingxian, ZHAO Xuequan. Clustering Validity Analysis Based on Contour Coefficient[J]. Journal of Computer Applications, 2010, 30(12):139-141.
	朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析[J]. 计算机应用, 2010, 30(12):139-141.

value p	001	010	110
1	${{C}_{\left( 001,1 \right)}}$	${{C}_{\left( 010,1 \right)}}$	${{C}_{\left( 110,1 \right)}}$
2	${{C}_{\left( 001,2 \right)}}$	${{C}_{\left( 010,2 \right)}}$	${{C}_{\left( 110,2 \right)}}$
3	${{C}_{\left( 001,3 \right)}}$	${{C}_{\left( 010,3 \right)}}$	${{C}_{\left( 110,3 \right)}}$

协议	会话数/个	协议格式类型数/个	报文个数/个
SMB	2290	8	21523
TLS	2000	2	4000

（threshold1,threshold2）	TNR	FPR
（0.05,0.95）	0	0.86
（0.1,0.9）	0	0
（0.2,0.8）	0.23	0

噪声类型目标协议	单一协议报文		混杂报文
噪声类型目标协议	TNR	FPR	TNR	FPR
SMB	0	0	0	0
TLS	0	0	0	0

噪声类型目标协议	单一协议报文		混杂报文
噪声类型目标协议	TNR	FPR	TNR	FPR
SMB	0	0.8	0	0
TLS	0	0.9	0	0