Netinfo Security ›› 2026, Vol. 26 ›› Issue (4): 591-604.doi: 10.3969/j.issn.1671-1122.2026.04.007

Previous Articles     Next Articles

Cross-Language Compiler Fuzzing Based on LLM Translation and Differential Testing

LI Yan1, YANG Wenzhang2, XUE Yinxing2()   

  1. 1 School of Software Engineering, University of Science and Technology of China, Hefei 230026, China
    2 Institute of AI for Industries, Chinese Academy of Sciences, Nanjing 211135, China
  • Received:2025-12-16 Online:2026-04-10 Published:2026-04-29

Abstract:

Modern software systems have become increasingly complex, making the correctness and reliability of compilers critical. Traditional compiler fuzzing techniques face limitations in multi-language scenarios, including the high cost of rule maintenance and the difficulty of cross-language consistency verification. The capabilities of large language models (LLM) in code translation and semantic reasoning provide a new perspective for addressing these challenges. This paper proposed Fuzpiler, a cross-language compiler fuzzing framework based on LLM-driven translation and semantic reasoning, to uncover potential compiler vulnerabilities. Fuzpiler first employed existing fuzzing tools to asynchronously generate fuzzing seeds and selected promising samples through multi-objective optimization. It then leveraged an LLM to translate the selected seeds into semantically equivalent programs in multiple programming languages, constructing cross-language “homologous” fuzzing seed sets. For semantic validation, the framework utilized the reasoning capability of LLMs to align the semantics of multi-language programs and performed differential testing to detect behavioral inconsistencies in compilers across different language front ends or optimization stages.Fuzpiler was experimentally evaluated on three compilers, namely Clang, Clang++, and Rustc. Experimental results show that, compared with baseline tools, Fuzpiler improves branch coverage by 5.19%, 36.57%, and 23.91% on the three compilers, respectively, demonstrating the effectiveness of LLMs in cross-language test generation, semantic alignment, and consistency verification.

Key words: compiler fuzzing, large language models, code translation, differential testing

CLC Number: