信息网络安全 ›› 2024, Vol. 24 ›› Issue (8): 1277-1290.doi: 10.3969/j.issn.1671-1122.2024.08.013

• 技术研究 • 上一篇    下一篇

基于预训练Transformers的物联网设备识别方法

邢长友, 王梓澎(), 张国敏, 丁科   

  1. 陆军工程大学指挥控制工程学院,南京 210007
  • 收稿日期:2024-03-27 出版日期:2024-08-10 发布日期:2024-08-22
  • 通讯作者: 王梓澎 17641235907@163.com
  • 作者简介:邢长友(1982—),男,河南,教授,博士,CCF会员,主要研究方向为软件定义网络、网络安全和网络功能虚拟化|王梓澎(2000—),男,辽宁,硕士研究生,主要研究方向为网络安全|张国敏(1979—),男,山东,副教授,博士,主要研究方向为软件定义网络、网络安全、网络测量和分布式系统|丁科(1978—),男,江苏,讲师,博士,主要研究方向为网络虚拟化技术和网络安全
  • 基金资助:
    国家自然科学基金(62172432)

IoT Device Identification Method Based on Pre-Trained Transformers

XING Changyou, WANG Zipeng(), ZHANG Guomin, DING Ke   

  1. Command and Control Engineering College, Army Engineering University, Nanjing 210007, China
  • Received:2024-03-27 Online:2024-08-10 Published:2024-08-22

摘要:

为帮助网络管理员迅速隔离局域网内的异常、易受攻击的物联网设备,以防攻击者利用设备漏洞侵入内部网络进行潜伏和后续深度攻击,高效的物联网设备识别方法显得尤为重要。然而,现有基于机器学习的识别方法普遍存在特征选择过程复杂、获取的数据流特征不稳定等问题,从而影响了识别准确性。为此,文章提出了一种基于预训练Transformers的物联网设备识别方法,该方法主要通过IoTBERT模型对设备流量进行处理,以实现物联网设备识别目标。IoTBERT包括预训练单元和设备识别单元等核心组件,预训练单元通过使用无标记物联网设备流量数据训练ALBERT模型,将数据特征编码嵌入高维特征向量中,从而获取流量特征表示模型。设备识别单元则利用标记数据微调预训练模型的参数权重,并结合残差网络在分组级别上完成物联网设备识别。该方法自动学习流量特征表示并执行分类识别决策,无需人工设计特征工程和手动构建多阶段处理流程,直接将原始数据分组编码映射到相应的类别标签,从而实现端到端的物联网设备识别。在公开数据集Aalto、UNSW和CIC IoT上的实验结果表明,文章所提方法能够基于数据分组有效识别物联网设备,并且该方法的平均识别准确率分别达到97.2%、92.1%和99.8%。

关键词: 物联网, 设备识别, 表示学习, 预训练模型

Abstract:

To help network administrators quickly isolate anomalous and vulnerable IoT devices in the LAN to prevent attackers from exploiting device vulnerabilities to penetrate the internal network for latent and subsequent deep attacks, efficient IoT device identification methods are particularly important. However, existing machine learning-based classification methods generally suffer from the problems of cumbersome feature selection process and unstable data flow features, which affect the identification accuracy. Accordingly, IoT device identification method based on pre-trained transformers was proposed. This method mainly realized the goal of IoT device identification by processing the device traffic through the model IoTBERT model. IoTBERT included two major components, the pre-training module and the device identification module. The pre-training module trained the ALBERT model by using the unlabeled IoT device flow data, and embedding data feature encoding into high-dimensional feature vectors to achieve the acquisition of traffic feature representation models. While the device identification module used the labeled data to fine-tune the parameter weights of the pre-trained model, and combined the residual networks to accomplish the identification of IoT devices using the packet-level information. This method automatically learnt traffic feature representations and performed classification and identification decisions, eliminating the need for manually designing feature engineering and manually building multi-stage processing flows. It directly mapped raw data grouping codes to corresponding category labels for end-to-end IoT device identification. The experimental results on the publicly available datasets Aalto, UNSW and CIC IoT show that this method is able to recognize and classify IoT devices effectively, and the average recognition accuracy of the method reaches 97.2%, 92.1% and 99.8% respectively.

Key words: Internet of things, device identification, representation learning, pre-trained model

中图分类号: