信息网络安全 ›› 2025, Vol. 25 ›› Issue (1): 124-132.doi: 10.3969/j.issn.1671-1122.2025.01.011

• 理论研究 • 上一篇    下一篇

基于子树序列规则挖掘的Dockerfile误配置检测和修复方法

王金双, 赵宁(), 崔帅   

  1. 陆军工程大学指挥控制工程学院,南京 210007
  • 收稿日期:2024-05-14 出版日期:2025-01-10 发布日期:2025-02-14
  • 通讯作者: 赵宁 E-mail:zhaonig@yeah.net
  • 作者简介:王金双(1978—),男,江苏,副教授,博士,主要研究方向为系统安全|赵宁(1993—),女,辽宁,硕士研究生,CCF会员,主要研究方向为系统安全|崔帅(1998—),女,山东,硕士研究生,主要研究方向为系统安全

A Method for Subtree Sequence Rule Mining-Based Dockerfile Misconfiguration Detection and Repair

WANG Jinshuang, ZHAO Ning(), CUI Shuai   

  1. College of Command and Control Engineering, Army Engineering University, Nanjing 210007, China
  • Received:2024-05-14 Online:2025-01-10 Published:2025-02-14
  • Contact: ZHAO Ning E-mail:zhaonig@yeah.net

摘要:

Dockerfile是用于构建Docker容器镜像的文本文件,它包含一系列指令和配置,用于描述如何组装一个Docker容器的运行环境。然而,配置不当的Dockerfile可能引发很多性能与安全方面的问题。现有基于规则挖掘的检测与修复方法主要关注常见命令内部的关联性,忽略了命令之间的依赖联系。这些方法通常侧重于高频命令,容易遗漏低频命令中的隐藏规律。针对上述问题,文章提出一种基于子树序列规则挖掘的Dockerfile误配置检测和修复方法。首先,将Dockerfile转化为抽象语法树,进一步分解为有序子树,并对子树进行序列化处理,以构建中间表示。随后,通过对子树进行聚类分组,并采用序列规则挖掘算法对各分组执行规则挖掘。在规则挖掘过程中,将规则前项限定为子树根节点,从而有效聚焦目标指令并抑制规则生成的爆发式增长。最后,进一步筛选最大序列规则,提取常用命令搭配模式,并总结归纳语义规则作为基准,用于Dockerfile的违规检测和自动修复。实验结果表明,该方法成功挖掘出31条语义规则,其中包括12条未公开规则。在违规检测精确率方面,相较于基准方法提升了10%;在修复成功率方面,相较于基准方法提升了5.6%。

关键词: Docker, Dockerfile, 规则挖掘, 违规检测, 自动修复

Abstract:

A Dockerfile is a text file used for building Docker container images. It includes a series of instructions and configurations that outline how to assemble a Docker container’s environment. Dockerfile misconfigurations can cause numerous performance and security issues. The existing rule-mining based detection and repair methods focus predominantly on associations within common commands, while neglect dependencies between commands. These methods usually target high-frequency commands, however ignore patterns with low frequencies. In response to the above issues, a method for subtree sequence rule mining-based Dockerfile misconfiguration detection and repair was proposed. First, the Dockerfile was converted into an abstract syntax tree. This tree was broken down into ordered subtrees, which were serialized to form an intermediate representation. Second, the subtrees were grouped into clusters. A sequence rule mining algorithm was then applied to these clusters for rule extraction. Meanwhile, the left-hand side of the rules was constrained to the root node of the subtrees, focusing on target instructions and preventing the explosive growth of rule generation. Finally, the largest sequence rules were identified to synthesize common command combinations, and semantic rules were derived to serve as a guideline for Dockerfile violation detection and automatic repair. Experiments show that this method successfully extracts 31 semantic rules, including 12 rules that are previously unpublished. It improves the precision rate of violation detection by 10% and the success rate of repair by 5.6% compared to baseline methods.

Key words: Docker, Dockerfile, rule mining, violation detection, automatic repair

中图分类号: