基于图神经网络和通用漏洞分析框架的C类语言漏洞检测方法

doi:10.3969/j.issn.1671-1122.2022.10.009

信息网络安全 ›› 2022, Vol. 22 ›› Issue (10): 59-68.doi: 10.3969/j.issn.1671-1122.2022.10.009

基于图神经网络和通用漏洞分析框架的C类语言漏洞检测方法

朱丽娜¹, 马铭芮²^,³^,⁴(), 朱东昭⁵

1.广东警官学院网络信息安全系，广州 510442
2.华中科技大学网络空间安全学院，武汉 430074
3.分布式系统安全湖北省重点实验室，武汉 430074
4.湖北省大数据安全工程技术研究中心，武汉 430074
5.中国移动信息技术有限公司黑龙江分公司，哈尔滨 150001

收稿日期:2022-07-01 出版日期:2022-10-10 发布日期:2022-11-15
通讯作者: 马铭芮 E-mail:jkpathfinder@126.com
作者简介:朱丽娜（1974—），女，山东，讲师，硕士，主要研究方向为网络信息安全|马铭芮（2000—），男，黑龙江，硕士研究生，主要研究方向为神经网络、深度学习和网络信息安全|朱东昭（1977—），男，山东，高级工程师，硕士，主要研究方向为大数据和网络信息安全
基金资助:
国家自然科学基金(6217071437);国家自然科学基金(62072200);国家自然科学基金(62127808);广东省自然科学基金(2020A1515011096);广东省自然科学基金(2019A1515011841);广东警官学院院级科研项目(2022SY02)

Detection Method for C Language Family Based on Graph Neural Network and Generic Vulnerability Analysis Framework

ZHU Lina¹, MA Mingrui²^,³^,⁴(), ZHU Dongzhao⁵

1. Department of Network Information Security, Guangdong Police College, Guangzhou 510442, China
2. School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
3. Hubei Key Laboratory of Distributed System Security, Wuhan 430074, China
4. Hubei Engineering Research Center on Big Data Security, Wuhan 430074, China
5. Heilongjiang Branch of China Mobile Information Technology Co., Ltd., Harbin 150001, China

Received:2022-07-01 Online:2022-10-10 Published:2022-11-15
Contact: MA Mingrui E-mail:jkpathfinder@126.com

摘要/Abstract

摘要：

现有的自动化漏洞挖掘工具大多泛化能力较差，具有高误报率与漏报率。文章提出一种针对C类语言的多分类漏洞静态检测模型CSVDM。CSVDM运用代码相似性比对模块与通用漏洞分析框架模块从源码层面进行漏洞挖掘，代码相似性比对模块运用最长公共子序列（Longest Common Subsequence，LCS）算法与图神经网络对待检测源码与漏洞模板实施代码克隆与同源性检测，根据预设阈值生成漏洞相似度列表。通用漏洞分析框架模块对待检测源码进行上下文依赖的数据流与控制流分析，弥补了代码相似性比对模块在检测不是由代码克隆引起的漏洞时高假阴性的缺陷，生成漏洞分析列表。CSVDM综合漏洞相似度列表与漏洞分析列表，生成最终的漏洞检测报告。实验结果表明，CSVDM相较于Checkmarx等漏洞挖掘工具在评价指标方面有较大幅度提升。

关键词: 通用漏洞分析框架, LCS算法, Skip-Gram模型, 图神经网络, 图注意力机制

Abstract:

Most of the existing automated vulnerability mining tools have poor generalization ability and high false positive and false negative rale. In this paper, a static detection model called CSVDM was proposed for multi-class vulnerabilities in C language family. CSVDM used code similarity detection and generic vulnerability analysis framework module to perform vulnerability mining at the source code level. The similarity detection module integrated longest common subsequence(LCS) algorithm and graph neural network to implement code cloning and homology detection, generating the vulnerability similarity list according to a preset threshold. The generic vulnerability analysis framework module performed context-dependent data flow and controled flow analysis of the source code to be tested to compensate for the the similarity detection module’s high false negatives in detecting vulnerabilities not caused by code cloning, and generated the vulnerability analysis list. CSVDM combined the vulnerability similarity list and the vulnerability analysis list to generate the final vulnerability detection report. The experimental results show that CSVDM has a substantial improvement in evaluation metrics compared to other vulnerability mining tools such as checkmarx.

Key words: generic vulnerability analysis framework, LCS algorithm, Skip-Gram model, graph neural network, graph attention mechanism

中图分类号:

TP309

朱丽娜, 马铭芮, 朱东昭. 基于图神经网络和通用漏洞分析框架的C类语言漏洞检测方法[J]. 信息网络安全, 2022, 22(10): 59-68.

ZHU Lina, MA Mingrui, ZHU Dongzhao. Detection Method for C Language Family Based on Graph Neural Network and Generic Vulnerability Analysis Framework[J]. Netinfo Security, 2022, 22(10): 59-68.

图/表 21

图1

图2

表1

图3

图4

图5

表2

图6

图7

表3

图8

图9

图10

图11

图12

表4

图13

表5

图14

表6

图15

参考文献 15

[1]	LIU Jian, SU Purui, YANG Min, et al. Software and Cyber Security-A Survey[J]. Journal of Software, 2018, 29(1): 42-68.
	刘剑, 苏璞睿, 杨珉, 等. 软件与网络安全研究综述[J]. 软件学报, 2018, 29(1): 42-68.
[2]	KULENOVIC M, DONKO D. A Survey of Static Code Analysis Methods for Security Vulnerabilities Detection[C]// IEEE. 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). New York: IEEE, 2014: 1381-1386.
[3]	MA Mingrui, HAN Lansheng, QIAN Yekui. CVDF DYNAMIC-A Dynamic Fuzzy Testing Sample Generation Framework Based on Bi-LSTM and Genetic Algorithm[J]. Sensors, 2022, 22(3): 12-25.
[4]	LI Zhen, ZOU Deqing, XU Shouhuai, et al. Vulpecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis[C]// ACM. Proceedings of the 32nd Annual Conference on Computer Security Applications. New York: ACM, 2016: 201-213.
[5]	XIA Zhiyang, YI Ping, YANG Tao. Static Vulnerability Detection Based on Neural Network and Code Similarity[J]. Computer Engineering, 2019, 45(12): 141-146.
	夏之阳, 易平, 杨涛. 基于神经网络与代码相似性的静态漏洞检测[J]. 计算机工程, 2019, 45(12): 141-146.
[6]	LIANG Hongliang, WANG Lei, WU Dongyang, et al. MLSA: A Static Bugs Analysis Tool Based on LLVM IR[C]// IEEE. 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). New York: IEEE, 2016: 407-412.
[7]	FANG Zhejun, LIU Qixu, ZHANG Yuqing, et al. A Static Technique for Detecting Input Validation Vulnerabilities in Android Apps[J]. Science China Information Sciences, 2017, 60(5): 1-16.
[8]	GRIECO G, GRINBLAT G L, UZAL L, et al. Toward Large-Scale Vulnerability Discovery Using Machine Learning[C]// ACM. Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. New York: ACM, 2016: 85-96.
[9]	YAMAGUCHI F, MAIER A, GASCON H, et al. Automatic Inference of Search Patterns for Taint-Style Vulnerabilities[C]// IEEE. 2015 IEEE Symposium on Security and Privacy. New York: IEEE, 2015: 797-812.
[10]	LI Zhen, ZOU Deqing, XU Shouhuai, et al. Vuldeepecker: A Deep Learning-Based System for Vulnerability Detection[J]. (2018-01-05)[2022-06-22]. https://arxiv.org/abs/1801.01681v1.
[11]	LIN Guanjun, ZHANG Jun, LUO Wei, et al. Cross-Project Transfer Representation Learning for Vulnerable Function Discovery[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3289-3297.
[12]	ZHOU Yaqin, LIU Shangqing, SIOW J, et al. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks[J]. Advances in Neural Information Processing Systems, 2019(32): 12-18.
[13]	MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL]. (2012-09-07)[2022-06-22]. https://arxiv.org/abs/1301.3781.
[14]	TANG Jian, QU Meng, WANG Mingzhe, et al. LINE: Large-Scale Information Network Embedding[EB/OL]. (2015-03-12)[2022-06-22]. https://arxiv.org/abs/1503.03578.
[15]	NIST. NVD[EB/OL]. [2022-06-29]. https://nvd.nist.gov/.

状态字符	$Q'$	$L$	$D$	$U$
0	$\{X\}$	$\text{ }\!\!\{\!\!\text{ 1,2,}Y\text{ }\!\!\}\!\!\text{ }$	$\varphi $	$\text{ }\!\!\{\!\!\text{ 1,2,}Y\text{ }\!\!\}\!\!\text{ }$
1	$\text{ }\!\!\{\!\!\text{ 1,2,}Y\text{ }\!\!\}\!\!\text{ }$	$\text{ }\!\!\{\!\!\text{ 2,}Y\text{ }\!\!\}\!\!\text{ }$	$\text{ }\!\!\{\!\!\text{ 2,}Y\text{ }\!\!\}\!\!\text{ }$	$\text{ }\!\!\{\!\!\text{ 2,}Y\text{ }\!\!\}\!\!\text{ }$
2	$\text{ }\!\!\{\!\!\text{ 2,}Y\text{ }\!\!\}\!\!\text{ }$	$\text{ }\!\!\{\!\!\text{ 2,}Y\text{ }\!\!\}\!\!\text{ }$	$\text{ }\!\!\{\!\!\text{ 2,}Y\text{ }\!\!\}\!\!\text{ }$	$\text{ }\!\!\{\!\!\text{ 2,}Y\text{ }\!\!\}\!\!\text{ }$

数量	I	like	enjoy	deep	Learning	NLP	Flying	.
I	0	2	1	0	0	0	0	0
like	2	0	0	1	0	1	0	0
enjoy	1	0	0	0	0	0	1	0
deep	0	1	0	0	1	0	0	0
Learning	0	0	0	1	0	0	0	1
NLP	0	1	0	0	0	0	0	1
Flying	0	0	1	0	0	0	0	1
.	0	0	0	0	1	1	1	0

评价指标模型	W_AvgFPR	W_AvgTPR	W_AvgF1
CSVDM	1.03%	92.88%	91.37%
Vuldeepecker	19.84%	63.87%	65.32%
Checkmarx	23.81%	46.52%	43.28%
Flawfinder	39.64%	32.38%	31.73%
相似性模型	10.33%	69.84%	69.24%

评价指标模型	W_AvgFPR	W_AvgTPR	W_AvgF1
Model1	5.84%	82.17%	80.65%
Model2	8.89%	77.69%	75.64%
Model3	8.17%	78.17%	76.21%
Model4	1.96%	89.69%	87.24%
Model5	1.03%	92.88%	91.37%

评价指标$\theta $	W_AvgFPR	W_AvgTPR	W_AvgF1
0.2	7.83%	82.37%	79.66%
0.3	6.25%	84.36%	81.74%
0.4	4.89%	87.87%	85.23%
0.5	2.52%	89.97%	88.35%
0.6	1.04%	92.78%	91.29%
0.7	1.74%	90.46%	89.33%
0.8	3.84%	88.71%	84.99%

基于图神经网络和通用漏洞分析框架的C类语言漏洞检测方法

Detection Method for C Language Family Based on Graph Neural Network and Generic Vulnerability Analysis Framework

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 21

参考文献 15

相关文章 3

编辑推荐

Metrics

本文评价

[1]	仝鑫, 金波, 王靖亚, 杨莹. 一种面向Android恶意软件的多视角多任务学习检测方法[J]. 信息网络安全, 2022, 22(10): 1-7.
[2]	石拓, 梁飞, 尚钢川, 田洋俊. 基于时序交易图注意力神经网络的以太坊恶意账户检测[J]. 信息网络安全, 2022, 22(10): 69-75.
[3]	秦中元, 胡宁, 方兰婷. 基于免疫仿生机理和图神经网络的网络异常检测方法[J]. 信息网络安全, 2021, 21(8): 10-16.