Cross-Language Compiler Fuzzing Based on LLM Translation and Differential Testing

doi:10.3969/j.issn.1671-1122.2026.04.007

Abstract

Abstract:

Modern software systems have become increasingly complex, making the correctness and reliability of compilers critical. Traditional compiler fuzzing techniques face limitations in multi-language scenarios, including the high cost of rule maintenance and the difficulty of cross-language consistency verification. The capabilities of large language models (LLM) in code translation and semantic reasoning provide a new perspective for addressing these challenges. This paper proposed Fuzpiler, a cross-language compiler fuzzing framework based on LLM-driven translation and semantic reasoning, to uncover potential compiler vulnerabilities. Fuzpiler first employed existing fuzzing tools to asynchronously generate fuzzing seeds and selected promising samples through multi-objective optimization. It then leveraged an LLM to translate the selected seeds into semantically equivalent programs in multiple programming languages, constructing cross-language “homologous” fuzzing seed sets. For semantic validation, the framework utilized the reasoning capability of LLMs to align the semantics of multi-language programs and performed differential testing to detect behavioral inconsistencies in compilers across different language front ends or optimization stages.Fuzpiler was experimentally evaluated on three compilers, namely Clang, Clang++, and Rustc. Experimental results show that, compared with baseline tools, Fuzpiler improves branch coverage by 5.19%, 36.57%, and 23.91% on the three compilers, respectively, demonstrating the effectiveness of LLMs in cross-language test generation, semantic alignment, and consistency verification.

Key words: compiler fuzzing, large language models, code translation, differential testing

CLC Number:

TP309

LI Yan, YANG Wenzhang, XUE Yinxing. Cross-Language Compiler Fuzzing Based on LLM Translation and Differential Testing[J]. Netinfo Security, 2026, 26(4): 591-604.

Figures/Tables 10

References 32

[1]	RAHMAN A, BOSE D B, BARSHA F L, et al. Defect Categorization in Compilers: A Multi-Vocal Literature Review[J]. ACM Computing Surveys, 2024, 56(4): 1-42.
[2]	MANÈS V J M, HAN H, HAN C, et al. The Art, Science, and Engineering of Fuzzing: A Survey[J]. IEEE Transactions on Software Engineering, 2021, 47(11): 2312-2331.
[3]	YANG Xuejun, CHEN Yang, EIDE E, et al. Finding and Understanding Bugs in C Compilers[C]// ACM. The 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2011: 283-294.
[4]	LIVINSKII V, BABOKIN D, REGEHR J. Random Testing for C and C++ Compilers with YARPGen[J]. Proceedings of the ACM on Programming Languages, 2020, 4: 1-25.
[5]	SHARMA M, YU Pingshi, DONALDSON A F. RustSmith: Random Differential Compiler Testing for Rust[C]// ACM. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2023: 1483-1486.
[6]	HOLLER C, HERZIG K, ZELLER A. Fuzzing with Code Fragments[C]// USENIX. 21st USENIX Security Symposium. Berkeley: USENIX, 2012: 445-458.
[7]	CHALIASOS S, SOTIROPOULOS T, SPINELLIS D, et al. Finding Typing Compiler Bugs[C]// ACM. The 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: ACM, 2022: 183-198.
[8]	LE V, AFSHARI M, SU Zhendong. Compiler Validation via Equivalence Modulo Inputs[J]. ACM SIGPLAN Notices, 2014, 49(6): 216-226.
[9]	LE V, SUN Chengnian, SU Zhendong. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation[J]. ACM SIGPLAN Notices, 2015, 50(10): 386-399.
[10]	LIDBURY C, LASCU A, CHONG N, et al. Many-Core Compiler Fuzzing[J]. ACM SIGPLAN Notices, 2015, 50(6): 65-76.
[11]	JIANG Bo, WANG Xiaoyan, CHAN W K, et al. CUDAsmith: A Fuzzer for CUDA Compilers[C]// IEEE. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). New York: IEEE, 2020: 861-871.
[12]	XIAO Dongwei, LIU Zhibo, YUAN Yuanyuan, et al. Metamorphic Testing of Deep Learning Compilers[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2022, 6(1): 1-28.
[13]	CUMMINS C, PETOUMENOS P, MURRAY A, et al. Compiler Fuzzing through Deep Learning[C]// ACM. The 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2018: 95-105.
[14]	LEE S, HAN H S, CHA S K, et al. Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer[C]// USENIX. 29th USENIX Security Symposium. Berkeley: USENIX, 2020: 2613-2630.
[15]	LIU Xiao, LI Xiaoting, PRAJAPATI R, et al. DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 1044-1051.
[16]	XU Haoran, WANG Yongjun, FAN Shuhui, et al. DSmith: Compiler Fuzzing through Generative Deep Learning Model with Attention[C]// IEEE. 2020 International Joint Conference on Neural Networks (IJCNN). New York: IEEE, 2020: 1-9.
[17]	XIA C S, PALTENGHI M, JIA Letian, et al. Fuzz4All: Universal Fuzzing with Large Language Models[C]// ACM. The IEEE/ACM 46th International Conference on Software Engineering. New York: ACM, 2024: 1-13.
[18]	LIU Fang, LIU Yang, SHI Lin, et al. Beyond Functional Correctness: Exploring Hallucinations in LLM-Generated Code[EB/OL].(2024-05-11)[2025-10-25]. https://arxiv.org/abs/2404.00971.
[19]	ZHU Xiaogang, ZHOU Wei, HAN Qinglong, et al. When Software Security Meets Large Language Models: A Survey[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(2): 317-334.
[20]	MIAO Siwei, WANG Juan, ZHANG Chong, et al. Deep Learning in Fuzzing: A Literature Survey[C]// IEEE. 2022 IEEE the 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). New York: IEEE, 2022: 220-223.
[21]	ALAGARSAMY S, TANTITHAMTHAVORN C, ALETI A. A3Test:Assertion-Augmented Automated Test Case Generation[EB/OL].(2024-08-30)[2025-10-25]. https://doi.org/10.1016/j.infsof.2024.107565.
[22]	DENG Yinlin, XIA C S, YANG Chenyuan, et al. Large Language Models Are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT[EB/OL].(2023-04-04)[2025-10-25]. https://arxiv.org/abs/2304.02014.
[23]	ZHANG Hongxiang, RONG Yuyang, HE Yifeng, et al. LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing[EB/OL].(2025-10-03)[2025-10-25]. https://arxiv.org/abs/2406.07714.
[24]	DENG Yinlin, XIA C S, PENG Haoran, et al. Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models[C]// ACM. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2023: 423-435.
[25]	NASHID N, SINTAHA M, MESBAH A. Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning[C]// IEEE. 2023 IEEE/ACM the 45th International Conference on Software Engineering (ICSE). New York: IEEE, 2023: 2450-2462.
[26]	VIKRAM V, LEMIEUX C, SUNSHINE J, et al. Can Large Language Models Write Good Property-Based Tests[EB/OL].(2024-07-22)[2025-10-25]. https://arxiv.org/abs/2307.04346.
[27]	CHEN Yinghao, HU Zehao, ZHI Chen, et al. ChatUniTest: A Framework for LLM-Based Test Generation[C]// ACM. The 32nd ACM International Conference on the Foundations of Software Engineering. New York: ACM, 2024: 572-576.
[28]	MAHBUB P, RAHMAN M M, SHUVO O, et al. Bugsplainer: Leveraging Code Structures to Explain Software Bugs with Neural Machine Translation[C]// IEEE. 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). New York: IEEE, 2023: 530-535.
[29]	YUAN Zhiqiang, LIU Mingwei, DING Shiji, et al. Evaluating and Improving ChatGPT for Unit Test Generation[J]. Proceedings of the ACM on Software Engineering, 2024, 1: 1703-1726.
[30]	SHOU Chaofan, LIU Jing, LU Doudou, et al. LLM4Fuzz:Guided Fuzzing of Smart Contracts with Large Language Models[EB/OL].(2024-01-20)[2025-10-25]. https://arxiv.org/abs/2401.11108.
[31]	LI Yuekang, XUE Yinxing, CHEN Hongxu, et al. Cerebro: Context-Aware Adaptive Fuzzing for Effective Vulnerability Detection[C]// ACM. The 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2019: 533-544.
[32]	GALLEY M, GAO Jianfeng, HE Pengcheng, et al. Guiding Large Language Models via Directional Stimulus Prompting[J]. Advances in Neural Information Processing Systems, 2023, 36: 62630-62656.

差分测试维度	差分内容
跨编译器差分测试	源代码
跨编译器差分测试	翻译代码
跨优化等级差分测试	O0（关闭优化，以最直观方式生成代码）
	O1（只启用基础优化）
	O2（开启更激进的优化，包括循环展开与死代码消除）
	O3（面向性能的最高级别优化）
	Ofast（包含可能违反语言标准的激进优化）
	Os（以减小二进制体积为目标的优化）
	Oz（进一步压缩二进制大小）

编程语言	编译器	基线工具	测试版本	范式
C	Clang	CSmith^[3]	20.1.0	命令式
C++	Clang++	YarpGen^[4]	20.1.0	面向对象式、命令式、泛型编程
Rust	Rustc	RustSmith^[5]	1.89.0	函数式、命令式、并发式

编译器（总代码分支）	测试工具	测试种子数/个	分支覆盖/千行	分支覆盖对比	成本 /美元
Clang （2001432）	CSmith	1145.0	451293.0	—	—
Clang （2001432）	Fuzpiler	1039.0	474699.0	23406.0 （+ 5.19%）	2.08
Clang++ （1777523）	YarpGen	2143.2	151343.0	—	—
Clang++ （1777523）	Fuzpiler	928.8	206681.6	55338.6（+36.57%）	2.76
Rustc （619666）	RustSmith	6889.4	135555.0	—	—
Rustc （619666）	Fuzpiler	1045.4	167966.4	32411.4（+23.91%）	2.32

源语言	目标语言	测试种子数 /个	有效种子数 /个	有效种子占比	等价种子数 /个	等价种子占比
C	C++	464.0	358.8	77.33%	354.2	98.72%
C	Rust	524.2	325.4	62.08%	288.2	88.57%
C++	C	520.6	396.2	76.10%	332.0	83.80%
C++	Rust	521.2	236.0	45.28%	225.6	95.59%
Rust	C	518.4	425.4	82.06%	188.6	44.33%
Rust	C++	464.8	300.4	64.63%	191.0	63.58%
总计	—	3013.2	2042.2	67.78%	1579.6	77.35%

编译器	测试工具	测试种子数	有效种子数/个	有效种子占比	等价种子数/个	等价种子占比
Clang	Fuzpiler	1039	821.6	79.08%	510.6	62.15%
	w/o OOM	1104.4	814.4	73.74%	467.4	57.39%
	w/o SEC	1461.8	1073.4	73.43%	544.8	50.75%
Clang++	Fuzpiler	928.8	659.2	70.97%	545.2	82.71%
	w/o OOM	970.2	609.4	62.81%	491.2	80.60%
	w/o SEC	1183.4	736.2	62.21%	548.2	74.46%
Rustc	Fuzpiler	1045.4	561.4	53.70%	543.8	96.86%
	w/o OOM	1136.0	576.8	50.77%	523.8	90.81%
	w/o SEC	1452.6	723.8	49.83%	539.4	74.52%
编译器	测试工具	测试种子数	分支覆盖数/千行	分支覆盖对比	—	—
Clang	Fuzpiler	1039.0	474699.0	—	—	—
	w/o OOM	1104.4	471431.4	-3267.6 (-0.69%)	—	—
	w/o SEC	1461.8	473025.2	-1673.8 (-0.35%)	—	—
Clang++	Fuzpiler	928.8	206681.6	—	—	—
	w/o OOM	970.2	199463.0	-7218.6 (-3.65%)	—	—
	w/o SEC	1183.4	203331.0	-3350.6 (-1.62%)	—	—
Rustc	Fuzpiler	1045.4	167966.4	—	—	—
	w/o OOM	1136.0	154930.4	-13036 (-7.76%)	—	—
	w/o SEC	1452.6	159716.0	-8250.4 (-4.91%)	—	—