Netinfo Security ›› 2021, Vol. 21 ›› Issue (7): 72-79.doi: 10.3969/j.issn.1671-1122.2021.07.009

Previous Articles     Next Articles

Anti-noise Application Layer Binary Protocol Format Reverse Method

FANG Minzhi1,2(), CHENG Guang1,2, KONG Panyu1,2   

  1. 1. School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
    2. International Governance Research Base of Cyberspace, Southeast University, Nanjing 211189, China
  • Received:2021-02-04 Online:2021-07-10 Published:2021-07-23
  • Contact: FANG Minzhi E-mail:mzfang@njnet.edu.cn

Abstract:

The existing binary protocol format reverse methods based on network traffic deduce the protocol format by comparing multiple messages of the same type, but the noise messages in the message set will lead to low accuracy of protocol format recognition. This paper proposes a method of automatically removing the noise and deducing the protocol format. Firstly, the method mines the frequent items at each position of message sequence, identifies the special identification (FD) in the message set, and effectively removes the noise messages according to the sum of the frequency of FD at each position. Then the method performs recursive denoising and message segmentation according to the FD of the message header, performs k-means clustering in the message set obtained by message segmentation, and automatically determines the clustering number k by the contour coefficient to obtain the message subset of each single protocol format. Finally, the protocol format is obtained by using progressive multiple sequence alignment algorithm in each message subset. The experimental results show that the proposed method can effectively remove the mixed noise messages in the real environment traffic, effectively extract the key words in the protocol format, and deduce the protocol format.

Key words: binary protocol reverse, special identification, recursive clustering, sequence alignment, frequent item mining

CLC Number: