深度学习框架模糊测试研究综述

doi:10.3969/j.issn.1671-1122.2024.10.006

摘要/Abstract

摘要：

随着深度学习技术在多个领域的广泛应用，其框架的安全性和稳定性也变得尤为重要。文章从用户角度出发，分析了不同用户群体可能遇到的漏洞类型及相应的模糊测试方法。首先介绍了深度学习框架的发展背景及其重要性；然后详细讨论了针对模型库、深度学习框架及编译器的模糊测试研究现状，梳理了如模型变异、权重生成、样例构造和模型测试等关键技术，并以PyTorch和MLIR的漏洞为例分析了漏洞形成的原因；最后展望了未来的研究方向，包括错误定位与自动修复技术、大语言模型增强的模糊测试。

关键词: 深度学习, 模糊测试, 测试程序生成, 机器学习

Abstract:

With the widespread application of deep learning technology in various fields, ensuring the security and stability of its frameworks has become crucial. This paper starts from the user’s perspective to analyze the types of vulnerabilities that different user groups may encounter and the corresponding fuzzing test methods. The article first introduced the development background and importance of deep learning frameworks, then discussed in detail the current state of testing research for model libraries, deep learning frameworks, and compilers, and reviewed key techniques such as model mutation, weight generation, sample construction, and model testing. Then the article analyzed the root cause of bug in PyTorch and MLIR. Finally, the article looked forward to future research directions, including error localization and automatic repair techniques, as well as fuzzing test enhanced by large language models.

Key words: deep learning, fuzzing test, test case generation, machine learning

中图分类号:

TP309

张子涵, 赖清楠, 周昌令. 深度学习框架模糊测试研究综述[J]. 信息网络安全, 2024, 24(10): 1528-1536.

ZHANG Zihan, LAI Qingnan, ZHOU Changling. Survey on Fuzzing Test in Deep Learning Frameworks[J]. Netinfo Security, 2024, 24(10): 1528-1536.

图/表 8

图1

图2

表1

图3

图4

表2

图5

表3

参考文献 44

[1]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep Residual Learning for Image Recognition[EB/OL]. (2016-12-12)[2024-05-10]. https://ieeexplore.ieee.org/document/7780459/metrics#metrics.
[2]	GRIGORESCU S, TRASNEA B, COCIAS T, et al. A Survey of Deep Learning Techniques for Autonomous Driving[J]. Journal of Field Robotics, 2020, 37(3): 362-386. doi: 10.1002/rob.21918
[3]	TORFI A, SHIRVANI R A, KENESHLOO Y, et al. Natural Language Processing Advancements by Deep Learning: A Survey[EB/OL]. (2020-03-02)[2024-06-01]. http://arxiv.org/abs/2003.01200.
[4]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[5]	HICKMANN B, CHEN Jiesheng, ROTZIN M, et al. Intel Nervana Neural Network Processor-T(NNP-T) Fused Floating Point Many-Term Dot Product[C]// IEEE. 2020 IEEE 27th Symposium on Computer Arithmetic(ARITH). New York: IEEE, 2020: 133-136.
[6]	NVIDIA. NVIDIA Tensor Cores: Versatility for HPC & AI[EB/OL]. [2024-05-10]. https://www.nvidia.com/en-us/data-center/tensor-cores/.
[7]	CHEN Tianqi, MOREAU T, JIANG Ziheng, et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning[EB/OL]. (2021-02-27)[2024-05-30]. http://arxiv.org/abs/1802.04799.
[8]	PHAM H V, LUTELLIER T, QI Weizhen, et al. CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries[C]// IEEE. 2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE). New York: IEEE, 2019: 1027-1038.
[9]	KLIPPENSTEIN K. Exclusive: Surveillance Footage of Tesla Crash on SF’s Bay Bridge Hours After Elon Musk Announces “Self-Driving” Feature[EB/OL]. (2023-01-10)[2024-06-01]. https://theintercept.com/2023/01/10/tesla-crash-footage-autopilot/.
[10]	ZHANG Xiaoyu, JIANG Weipeng, SHEN Chao, et al. Survey: A Survey of Deep Learning Library Testing Methods[EB/OL]. (2024-04-27)[2024-06-02]. http://arxiv.org/abs/2404.17871.
[11]	JI Jiahe, KONG Wei, TIAN Jianwen, et al. Survey on Fuzzing Techniques in Deep Learning Libraries[C]// IEEE. 2023 8th International Conference on Data Science in Cyberspace(DSC). New York: IEEE, 2023: 461-467.
[12]	PAN R, BISWAS S, CHAKRABORTY M, et al. An Empirical Study on the Bugs Found while Reusing Pre-Trained Natural Language Processing Models[EB/OL]. (2022-11-30)[2024-06-01]. http://arxiv.org/abs/2212.00105.
[13]	CHEN Junjie, LIANG Yihua, SHEN Qingchao, et al. Toward Understanding Deep Learning Framework Bugs[J]. ACM Transactions on Software Engineering and Methodology, 2023, 32(6): 1-31.
[14]	DENG Yao, ZHENG Xi, ZHANG Tianyi, et al. A Declarative Metamorphic Testing Framework for Autonomous Driving[J]. IEEE Transactions on Software Engineering, 2023, 49(4): 1964-1982.
[15]	CAO Junming, CHEN Bihuan, SUN Chao, et al. Understanding Performance Problems in Deep Learning Systems[C]// ACM. 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2022: 357-369.
[16]	WEI Moshi, HARZEVILI N S, HUANG Yuekai, et al. Demystifying and Detecting Misuses of Deep Learning APIs[C]// ACM. IEEE/ACM 46th International Conference on Software Engineering. New York: ACM, 2024: 1-12.
[17]	WANG Zan, YAN Ming, CHEN Junjie, et al. LEMON: Deep Learning Library Testing via Effective Model Generation[C]// ACM. 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2020: 788-799.
[18]	GUO Qianyu, XIE Xiaofei, LI Yi, et al. Audee: Automated Testing for Deep Learning Frameworks[C]// ACM. 35th IEEE/ACM International Conference on Automated Software Engineering. New York: ACM, 2020: 486-498.
[19]	GU Jiazhen, LUO Xuchuan, ZHOU Yangfan, et al. Muffin: Testing Deep Learning Libraries via Neural Architecture Fuzzing[C]// ACM. The 44th International Conference on Software Engineering. New York: ACM, 2022: 1418-1430.
[20]	LIU Jiawei, LIN Jinkun, RUFFY F, et al. NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers[C]// ACM. The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2023: 530-543.
[21]	SHI Jingyi, XIAO Yang, LI Yuekang, et al. ACETest: Automated Constraint Extraction for Testing Deep Learning Operators[C]// ACM. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2023: 690-702.
[22]	XIE Danning, LI Yitong, KIM Mijung, et al. Documentation-Guided Fuzzing for Testing Deep Learning API Functions[C]// ACM. The 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2022: 176-188.
[23]	WEI Anjiang, DENG Yinlin, YANG Chenyuan, et al. FreeFuzz: Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source[C]// ACM. The 44th International Conference on Software Engineering. New York: ACM, 2022: 995-1007.
[24]	DENG Yinlin, YANG Chenyuan, WEI Anjiang, et al. Fuzzing Deep-Learning Libraries via Automated Relational API Inference[C]// ACM. The 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2022: 44-56.
[25]	YANG Chenyuan, DENG Yinlin, YAO Jiayi, et al. Fuzzing Automatic Differentiation in Deep-Learning Libraries[C]// IEEE. 2023 IEEE/ACM 45th International Conference on Software Engineering(ICSE). New York: IEEE, 2023: 1174-1186.
[26]	CHRISTOU N, JIN Di, KEMERLIS V. IvySyn: Automated Vulnerability Discovery in Deep Learning Frameworks[EB/OL]. (2022-09-29)[2024-05-10]. https://www.semanticscholar.org/paper/IvySyn%3A-Automated-Vulnerability-Discovery-in-Deep-Christou-Jin/58b1b17a04279361fb5d138f0cd8f8ab94029d69.
[27]	Github. Remove Some Interface Block Decoration by Llehtahw Pull Request #8102 Apache/TVM[EB/OL]. [2024-05-28]. https://github.com/apache/tvm/pull/8102.
[28]	Github. dpankratz/TVMFuzz[EB/OL]. (2024-02-18)[2024-05-12]. https://github.com/dpankratz/TVMFuzz.
[29]	WANG Zihan, NIE Pengbo, MIAO Xinyuan, et al. GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing[C]// ACM. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2023: 904-916.
[30]	WANG Haoyu, CHEN Junjie, XIE Chuyue, et al. MLIRSmith: Random Program Generation for Fuzzing MLIR Compiler Infrastructure[C]// IEEE. 38th IEEE/ACM International Conference on Automated Software Engineering(ASE). New York: IEEE, 2023: 1555-1566.
[31]	SU Qidong, GENG Chuqin, PEKHIMENKO G, et al. TorchProbe: Fuzzing Dynamic Deep Learning Compilers[EB/OL]. (2023-10-30)[2024-06-02]. http://arxiv.org/abs/2310.20078.
[32]	LIMPANUKORN B, WANG Jiyuan, KANG Hongjin, et al. Fuzzing MLIR by Synthesizing Custom Mutations[EB/OL]. (2024-04-25)[2024-05-12]. http://arxiv.org/abs/2404.16947.
[33]	LIU Jiawei, WEI Yuxiang, YANG Sen, et al. Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation[J]. The ACM on Programming Languages, 2022, 6: 1-26.
[34]	MA Haoyang, SHEN Qingchao, TIAN Yongqiang, et al. Fuzzing Deep Learning Compilers with HirGen[EB/OL]. (2022-08-03)[2024-05-10]. http://arxiv.org/abs/2208.02193.
[35]	LIN Kuiliang, SONG Xiangpu, ZENG Yingpei, et al. DeepDiffer: Find Deep Learning Compiler Bugs via Priority-Guided Differential Fuzzing[C]// IEEE. 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security(QRS). New York: IEEE, 2023: 616-627.
[36]	AGRAWAL H, DEMILLO R A, SPAFFORD E H. Debugging with Dynamic Slicing and Backtracking[J]. Software: Practice and Experience, 1993, 23(6): 589-616.
[37]	ZELLER A, HILDEBRANDT R. Simplifying and Isolating Failure-Inducing Input[J]. IEEE Transactions on Software Engineering, 2002, 28(2): 183-200.
[38]	HU Mingzhe, ZHAO Qi, ZHANG Yu, et al. FROG: Cross-Language Call Graph Construction Supporting Different Host Languages[C]// IEEE. 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering(SANER). New York: IEEE, 2023: 155-166.
[39]	LI Wen, MING Jiang, LUO Xiapu, et al. POLYCRUISE: A Cross-Language Dynamic Information Flow Analysis[C]// USENIX. 31st USENIX Security Symposium(USENIX Security 22). Berkeley: USENIX, 2022: 2513-2530.
[40]	KIM M, KIM Y, LEE E. Denchmark: A Bug Benchmark of Deep Learning-Related Software[C]// IEEE. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories(MSR). New York: IEEE, 2021: 540-544.
[41]	DENG Yinlin, XIA C S, PENG Haoran, et al. Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models[C]// ACM. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2023: 423-435.
[42]	DENG Yinlin, XIA Chunqiu, YANG Chenyuan, et al. Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries[C]// ACM. The IEEE/ACM 46th International Conference on Software Engineering. New York: ACM, 2024: 1-13.
[43]	CHEN M, TWOREK J, JUN H, et al. Evaluating Large Language Models Trained on Code[EB/OL]. (2021-07-07)[2024-05-29]. https://arxiv.org/abs/2107.03374.
[44]	NIJKAMP E, PANG B, HAYASHI H, et al. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis[EB/OL]. (2022-03-05)[2024-05-29]. http://arxiv.org/abs/2203.13474.

模型生成器	模型生成策略	变异指导规则	模型权重生成	样例构造
LEMON^[17]	添加、交换、复制layer等6种变异策略	马尔可夫链蒙特卡洛	添加高斯噪声、更改激活函数状态等5种	MNIST、CIFAR-10、ImageNet等 6个数据集
Audee^[18]	以LeNet-5、ResNet20等7个模型为种子，随机改变layer参数	遗传算法	向预训练模型添加Cauchy噪声	LeNet-5、MNIST、CIFAR-10
Muffin^[19]	根据模板生成基本结构，用卷积、池化等算子实例化计算图	基于适应度比例选择	未提到	MNIST、F-MNIST、CIFAR-10等 6个数据集
NNSmith^[20]	通过手工编写的算子属性约束生成增量图，并采用SMT求解器对layer属性进行实例化	—	用反向传播生成不产生异常值的模型权重	用反向传播生成不产生异常值的计算输入

漏洞类型	错误描述	issue编号
自动微分错误	当torch.pow的底数和指数不一致时，torch.pow的前向自动微分报错	77493
自动微分错误	logaddexp2不支持反向传播	77963
后端结果不一致	当torch.nn.functional.embedding中传入错误的行、列数时，CPU后端报错，而CUDA后端不会报错	66751
后端结果不一致	nn.Conv2d的CUDA实现与基于cuDNN的实现结果不一致	55381
类型不支持	torch.allclose算子不支持不同类型之间的比较（如float32与float16）	55356
类型不支持	torch.trace在CPU上不支持float16	65447
运行时出错	在GPU上运行int8的矩阵乘法会报错	49890
运行时出错	torch.sigmoid的输入为复数类型时会报错	55359

错误类型	错误描述	issue编号
方言之间转换	在—convert-scf-to-openmp pass中，当index.rems操作的第二个操作数为0时，会发生除0错误	59714
	在—convert-scf-to-spirv pass中，未在verifer中对浮点数f80类型进行验证，导致crash	60199
	在—gpu-to-llvm pass中，未对vector.mask中的maskOp操作数进行空指针校验，导致段错误	61094
方言内部通用转换	在—inline pass中，如果一个被内联的函数只有llvm.return语句，在内联时会发生崩溃	60093
	在—cse pass中，未考虑vector的秩为0的情况，导致崩溃	60193
	在—canonicalize pass中，没有对tensor进行维度合法性检查，导致崩溃	59703
特定方言内部转换	在func方言的—convert-func-to-llvm pass中，断言检查期望dim操作是一个常量，而实际上可以是一个变量，导致断言错误	59993
	在llvm方言的—llvm-legalize-for-export pass中，没有对llvm.br操作进行注册，导致生成llvm.br时发生崩溃	59462
	在affine方言的—affine-loop-unroll pass中，错误假定循环体中生成的返回值总是在与循环对应的块中，从而生成不存在的引用，导致崩溃	59234