Netinfo Security ›› 2024, Vol. 24 ›› Issue (8): 1277-1290.doi: 10.3969/j.issn.1671-1122.2024.08.013

Previous Articles     Next Articles

IoT Device Identification Method Based on Pre-Trained Transformers

XING Changyou, WANG Zipeng(), ZHANG Guomin, DING Ke   

  1. Command and Control Engineering College, Army Engineering University, Nanjing 210007, China
  • Received:2024-03-27 Online:2024-08-10 Published:2024-08-22

Abstract:

To help network administrators quickly isolate anomalous and vulnerable IoT devices in the LAN to prevent attackers from exploiting device vulnerabilities to penetrate the internal network for latent and subsequent deep attacks, efficient IoT device identification methods are particularly important. However, existing machine learning-based classification methods generally suffer from the problems of cumbersome feature selection process and unstable data flow features, which affect the identification accuracy. Accordingly, IoT device identification method based on pre-trained transformers was proposed. This method mainly realized the goal of IoT device identification by processing the device traffic through the model IoTBERT model. IoTBERT included two major components, the pre-training module and the device identification module. The pre-training module trained the ALBERT model by using the unlabeled IoT device flow data, and embedding data feature encoding into high-dimensional feature vectors to achieve the acquisition of traffic feature representation models. While the device identification module used the labeled data to fine-tune the parameter weights of the pre-trained model, and combined the residual networks to accomplish the identification of IoT devices using the packet-level information. This method automatically learnt traffic feature representations and performed classification and identification decisions, eliminating the need for manually designing feature engineering and manually building multi-stage processing flows. It directly mapped raw data grouping codes to corresponding category labels for end-to-end IoT device identification. The experimental results on the publicly available datasets Aalto, UNSW and CIC IoT show that this method is able to recognize and classify IoT devices effectively, and the average recognition accuracy of the method reaches 97.2%, 92.1% and 99.8% respectively.

Key words: Internet of things, device identification, representation learning, pre-trained model

CLC Number: