信息网络安全 ›› 2026, Vol. 26 ›› Issue (4): 591-604.doi: 10.3969/j.issn.1671-1122.2026.04.007

• 学术研究 • 上一篇    下一篇

基于LLM翻译与差分测试的跨语言编译器模糊测试

李岩1, 杨文章2, 薛吟兴2()   

  1. 1 中国科学技术大学软件学院合肥 230026
    2 中国科学院工业人工智能研究所南京 211135
  • 收稿日期:2025-12-16 出版日期:2026-04-10 发布日期:2026-04-29
  • 通讯作者: 薛吟兴 E-mail:yxxue@iaii.ac.cn
  • 作者简介:李岩(2000—),男,山东,硕士研究生,CCF会员,主要研究方向为模糊测试|杨文章(1994—),男,浙江,助理研究员,博士,CCF会员,主要研究方向为软件工程、程序设计语言|薛吟兴(1982—),男,江苏,研究员,博士,CCF会员,主要研究方向为软件安全、人工智能安全、网络空间安全
  • 基金资助:
    国家自然科学基金(61972373)

Cross-Language Compiler Fuzzing Based on LLM Translation and Differential Testing

LI Yan1, YANG Wenzhang2, XUE Yinxing2()   

  1. 1 School of Software Engineering, University of Science and Technology of China, Hefei 230026, China
    2 Institute of AI for Industries, Chinese Academy of Sciences, Nanjing 211135, China
  • Received:2025-12-16 Online:2026-04-10 Published:2026-04-29

摘要:

随着现代软件系统日益复杂,编译器的正确性与可靠性至关重要。传统编译器模糊测试方法在多语言场景下存在规则维护复杂以及跨语言一致性验证困难等局限。大语言模型在代码翻译与语义推理方面的能力,为解决该问题提供了新思路。文章提出一种基于大语言模型翻译与语义推理的跨语言编译器模糊测试框架Fuzpiler,以挖掘编译器潜在漏洞。Fuzpiler首先利用现有模糊测试工具异步生成测试种子,并通过多目标优化筛选测试样例。随后,借助大语言模型将种子翻译为多种语言的等价程序,构建跨语言“同源”测试种子集。在语义验证方面,该框架利用大语言模型的推理能力对多语言程序进行语义对齐,并通过差分测试检测编译器在不同语言前端或优化阶段的行为不一致性。文章在3种编译器(Clang、Clang++和Rustc)上对Fuzpiler进行实验评估。实验结果表明,与基线工具相比,Fuzpiler在3种编译器上的分支覆盖率分别提升了 5.19%、36.57%和23.91%,验证了大语言模型在跨语言测试生成、语义对齐与一致性验证中的有效性。

关键词: 编译器模糊测试, 大语言模型, 代码翻译, 差分测试

Abstract:

Modern software systems have become increasingly complex, making the correctness and reliability of compilers critical. Traditional compiler fuzzing techniques face limitations in multi-language scenarios, including the high cost of rule maintenance and the difficulty of cross-language consistency verification. The capabilities of large language models (LLM) in code translation and semantic reasoning provide a new perspective for addressing these challenges. This paper proposed Fuzpiler, a cross-language compiler fuzzing framework based on LLM-driven translation and semantic reasoning, to uncover potential compiler vulnerabilities. Fuzpiler first employed existing fuzzing tools to asynchronously generate fuzzing seeds and selected promising samples through multi-objective optimization. It then leveraged an LLM to translate the selected seeds into semantically equivalent programs in multiple programming languages, constructing cross-language “homologous” fuzzing seed sets. For semantic validation, the framework utilized the reasoning capability of LLMs to align the semantics of multi-language programs and performed differential testing to detect behavioral inconsistencies in compilers across different language front ends or optimization stages.Fuzpiler was experimentally evaluated on three compilers, namely Clang, Clang++, and Rustc. Experimental results show that, compared with baseline tools, Fuzpiler improves branch coverage by 5.19%, 36.57%, and 23.91% on the three compilers, respectively, demonstrating the effectiveness of LLMs in cross-language test generation, semantic alignment, and consistency verification.

Key words: compiler fuzzing, large language models, code translation, differential testing

中图分类号: