基于节点中心性和大模型的漏洞检测数据增强方法

doi:10.3969/j.issn.1671-1122.2025.04.004

摘要/Abstract

摘要：

智能系统源代码漏洞是影响其安全的重要因素，基于深度学习的源代码漏洞检测存在因数据集不平衡、规模小、质量低而引发的模型检测能力与泛化能力不足的问题。虽然采样技术和数据增强技术可改善一部分问题，但在真实数据集上效果不佳。为解决这些问题，文章提出基于节点中心性和大模型的漏洞检测数据增强方法DA_GLvul。该方法首先利用代码属性图将源代码抽象为图结构，并借助图节点中心性分析计算代码优先级值，将最大值对应节点的对应代码行作为关键代码语句，以实现在无已知漏洞语句信息的原始数据集的前提下定位关键代码语句。其次定义一个包含全面的变异规则的变异指令模板，填入原始样本与关键代码后输入至不同的大模型中以生成增强后的代码样本，最终使用增强代码样本与原始样本共同训练漏洞检测模型。实验结果表明，该方法生成的数据中有效样本占73.82%，较两个主流的基于图神经网络的漏洞检测模型在各项评估指标上均对原始结果有优化，其中F1值相比无增强方法平均提升168.85%，相比最优基线方法平均提升8.21%。

关键词: 漏洞检测, 代码生成, 数据增强, 大语言模型

Abstract:

Source code vulnerabilities in intelligent systems are an important factor affecting their security, and source code vulnerability detection based on deep learning faces the problems of insufficient model ability of detection and generalization caused by imbalanced, small-scale and low-quality datasets. While sampling techniques and data augmentation techniques could alleviate some of these problems, they didn’t work well on real datasets. To solve these problems, this paper proposed a data enhancement method based on graph node centrality and large model for vulnerability detection. The source code was abstracted into a graph structure by using the code attribute graph firstly, and then calculating the code priority value with the help of graph node centrality analysis. Code lines corresponding to nodes with their maximum value was taken as key code statements which can be located without original datasets of known vulnerability statement information. Second, defining a mutation instruction template containing comprehensive mutation rules, and generating enhanced code samples after inputting templates filled with original samples and key codes into different large models. Finally, enhanced code samples and original samples were jointly trained to build a vulnerability detection model. Experiments results show that the proportion of effective samples generated by proposed methods is 73.82%. Compared with different sampling techniques and sample augmentation methods in two mainstream graph neural network-based vulnerability detection models, this method has optimization in all evaluation indicators, among which the F1 value is increased by 168.85% on average compared with non-enhanced methods and 8.21% on average compared with the best baseline method.

Key words: vulnerability detection, code generation, data augmentation, large language models

中图分类号:

TP309

张学旺, 卢荟, 谢昊飞. 基于节点中心性和大模型的漏洞检测数据增强方法[J]. 信息网络安全, 2025, 25(4): 550-563.

ZHANG Xuewang, LU Hui, XIE Haofei. A Data Augmentation Method Based on Graph Node Centrality and Large Model for Vulnerability Detection[J]. Netinfo Security, 2025, 25(4): 550-563.

图/表 10

图1

图2

图3

图4

表1

表2

图5

图6

表3

表4

参考文献 40

[1]	CHEN Yufei, SHEN Chao, WANG Qian, et al. Security and Privacy Risks in Artificial Intelligence Systems[J]. Journal of Computer Research and Development, 2019, 56(10): 2135-2150.
	陈宇飞, 沈超, 王骞, 等. 人工智能系统安全与隐私风险[J]. 计算机研究与发展, 2019, 56(10): 2135-2150.
[2]	BLACKDUCK. 2024 Open Source Security and Risk Analysis Report[EB/OL]. (2024-12-05)[2024-12-28]. https://www.blackduck.com/resources/analyst-reports/open-source-security-risk-analysis.html.
[3]	Google. Rough-Auditing-Tool-for-Security[EB/OL]. (2014-01-01)[2024-12-28]. https://code.google.com/archive/p/rough-auditing-tool-for-security/.
[4]	CHECKMARX. Checkmarx[EB/OL]. (2024-12-12)[2024-12-28]. https://checkmarx.com/.
[5]	DWHEELER. Flawfinder[EB/OL]. (2005-03-01)[2024-12-28]. https://dwheeler.com/flawfinder/.
[6]	DUAN Xu, WU Jingzheng, LUO Tianyue, et al. Vulnerability Mining Method Based on Code Property Graph and Attention BiLSTM[J]. Journal of Software, 2020, 31(11): 3404-3420.
	段旭, 吴敬征, 罗天悦, 等. 基于代码属性图及注意力双向LSTM的漏洞挖掘方法[J]. 软件学报, 2020, 31(11): 3404-3420.
[7]	WU Yueming, ZOU Deqing, DOU Shihan, et al. VulCNN: An Image-Inspired Scalable Vulnerability Detection System[C]// IEEE. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). New York: IEEE, 2022: 2365-2376.
[8]	LI Zhen, ZOU Deqing, XU Shouhuai, et al. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection[EB/OL]. (2018-01-05)[2024-12-28]. https://export.arxiv.org/abs/1801.01681.
[9]	ZOU Deqing, WANG Sujuan, XU Shouhuai, et al. μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(5): 2224-2236.
[10]	LI Zhen, ZOU Deqing, XU Shouhuai, et al. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(4): 2244-2258.
[11]	ZHOU Yaqin, LIU Shangqing, SIOW J K, et al. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks[EB/OL]. (2019-12-08)[2024-12-28]. https://www.zhangqiaokeyan.com/academic-conference-foreign_meeting-290335_thesis/0705018328210.html.
[12]	CHENG Xiao, WANG Haoyu, HUA Jiayi, et al. DeepWukong[J]. ACM Transactions on Software Engineering and Methodology, 2021, 30(3): 1-33.
[13]	CHAKRABORTY S, KRISHNA R, DING Yangruibo, et al. Deep Learning Based Vulnerability Detection: Are We There Yet?[J]. IEEE Transactions on Software Engineering, 2022, 48(9): 3280-3296.
[14]	SU Xiaohong, ZHENG Weining, JIANG Yuan, et al. Research and Progress on Learning-Based Source Code Vulnerability Detection[J]. Chinese Journal of Computers, 2024, 47(2): 337-374.
	苏小红, 郑伟宁, 蒋远, 等. 基于学习的源代码漏洞检测研究与进展[J]. 计算机学报, 2024, 47(2): 337-374.
[15]	YANG Xu, WANG Shaowei, LI Yi, et al. Does Data Sampling Improve Deep Learning-Based Vulnerability Detection?Yeas! and Nays![C]// IEEE. 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). New York: IEEE, 2023: 2287-2298.
[16]	LU Guilong, JU Xiaolin, CHEN Xiang, et al. GRACE: Empowering LLM-Based Software Vulnerability Detection with Graph Structure and In-Context Learning[EB/OL]. (2024-03-21)[2024-12-28]. https://doi.org/10.1016/j.jss.2024.112031.
[17]	ZHANG Chenyuan, LIU Hao, ZENG Jiutian, et al. Prompt-Enhanced Software Vulnerability Detection Using ChatGPT[C]// IEEE. 2024 IEEE/ACM 46th International Conference on Software Engineering:Companion Proceedings (ICSE-Companion). New York: IEEE, 2024: 276-277.
[18]	ZHOU Xin, ZHANG Ting, LO D. Large Language Model for Vulnerability Detection: Emerging Results and Future Directions[C]// ACM. Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering:New Ideas and Emerging Results. New York: ACM, 2024: 47-51.
[19]	YAMAGUCHI F, GOLDE N, ARP D, et al. Modeling and Discovering Vulnerabilities with Code Property Graphs[C]// IEEE. 2014 IEEE Symposium on Security and Privacy. New York: IEEE, 2014: 590-604.
[20]	LI Yun, HUANG Chenlin, WANG Zhongfeng, et al. Survey of Software Vulnerability Mining Methods Based on Machine Learning[J]. Journal of Software, 2020, 31(7): 2040-2061.
	李韵, 黄辰林, 王中锋, 等. 基于机器学习的软件漏洞挖掘方法综述[J]. 软件学报, 2020, 31(7): 2040-2061.
[21]	The MITRE Corporation. CVE[EB/OL]. (2024-08-03)[2024-12-28]. https://cve.mitre.org/.
[22]	National Institute of Standards and Technology. NVD[EB/OL]. (2024-08-27)[2024-12-28]. https://nvd.nist.gov/.
[23]	China Information Technology Security Evaluation Center. China National Vulnerability Database of Information Security[EB/OL]. (2024-12-24)[2024-12-28]. https://www.cnnvd.org.cn.
	中国信息安全测评中心. 国家信息安全漏洞库[EB/OL]. (2024-12-24)[2024-12-28]. https://www.cnnvd.org.cn.
[24]	GITHUB. GitHub[EB/OL]. (2024-12-28)[2024-12-28]. https://github.com/.
[25]	National Institute of Standards and Technology. NIST Software Assurance Reference Dataset[EB/OL]. (2024-12-28)[2024-12-28]. https://samate.nist.gov/SARD.
[26]	KUBÁT M, MATWIN S. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection[EB/OL]. [2024-12-28]. https://www.researchgate.net/publication/2624358_Addressing_the_Curse_of_Imbalanced_Training_Sets_One-Sided_Selection.
[27]	CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
[28]	GANZ T, IMGRUND E, HÄRTERICH M, et al. CodeGraphSMOTE-Data Augmentation for Vulnerability Discovery[C]// Springer. IFIP Annual Conference on Data and Applications Security and Privacy. Heidelberg: Springer, 2023: 282-301.
[29]	NONG Yu, OU Yuzhe, PRADEL M, et al. VULGEN: Realistic Vulnerability Generation via Pattern Mining and Deep Learning[C]// IEEE. 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). New York: IEEE, 2023: 2527-2539.
[30]	NONG Yu, FANG R, YI Guangbei, et al. VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses[C]// ACM. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. New York: ACM, 2024: 1-13.
[31]	DANESHVAR S S, NONG Yu, YANG Xu, et al. Exploring RAG-Based Vulnerability Augmentation with LLMS[EB/OL]. (2024-12-05)[2024-12-28]. https://arxiv.org/abs/2408.04125.
[32]	JOERN. Joern[EB/OL]. (2024-12-28)[2024-12-28]. https://joern.io/.
[33]	FREEMAN L C. Centrality in Social Networks Conceptual Clarification[J]. Social Networks, 1979, 1(3): 215-239.
[34]	BRANDES U. A Faster Algorithm for Betweenness Centrality[J]. The Journal of Mathematical Sociology, 2001, 25(2): 163-177.
[35]	KATZ L. A New Status Index Derived from Sociometric Analysis[J]. Psychometrika, 1953, 18(1): 39-43.
[36]	YU Shiwen, WANG Ting, WANG Ji. Data Augmentation by Program Transformation[EB/OL]. (2022-03-26)[2024-12-28]. https://doi.org/10.1016/j.jss.2022.111304.
[37]	MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL]. (2013-09-07)[2024-12-28]. https://arxiv.org/abs/1301.3781v3.
[38]	THUDM. GLM-4[EB/OL]. (2024-12-28)[2024-12-28]. https://github.com/THUDM/GLM-4.
[39]	QwenLM. Qwen2.5[EB/OL]. (2024-12-24)[2024-12-28]. https://github.com/QwenLM/Qwen2.5.
[40]	FAN Jiahao, LI Yi, WANG Shaohua, et al. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries[C]// IEEE. 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). New York: IEEE, 2020: 508-512.

数据集	样本总数/个	漏洞样本数/个	非漏洞样本数/个	比例
Devign	22792	10768	12024	1:1.1
Reveal	22734	2240	20494	1:9.1

实际标签	预测结果
实际标签	1	0
1	TP	FN
0	FP	TN

检测模型	增强方式	大模型	FPR	FNR	A	P	R	F1
Devign	—	—	48.95%	45.19%	51.39%	10.21%	54.81%	17.22%
	OSS	—	46.92%	42.49%	53.09%	10.31%	53.11%	17.27%
	SMOTE	—	48.89%	43.25%	51.63%	10.55%	56.75%	17.79%
	VGX	VGX	63.15%	30.75%	39.84%	10.02%	69.25%	17.51%
	VulScribeR	ChatGPT3.5	56.59%	39.91%	44.95%	9.74%	60.09%	16.76%
	VulScribeR	CodeQwen1.5	60.15%	37.73%	41.92%	9.51%	62.27%	16.51%
	本文方法	GLM-4	36.45%	44.19%	63.84%	13.46%	55.81%	21.69%
	本文方法	Qwen2.5	41.55%	42.14%	58.40%	12.39%	57.86%	20.41%
Reveal	—	—	33.72%	65.96%	63.31%	9.30%	34.04%	14.61%
	VGX	VGX	58.82%	43.13%	42.63%	8.94%	56.87%	15.46%
	VulScribeR	ChatGPT3.5	54.81%	51.00%	45.55%	8.33%	49.00%	14.23%
	VulScribeR	CodeQwen1.5	58.62%	47.89%	42.37%	8.28%	52.11%	14.29%
	本文方法	GLM-4	38.70%	56.87%	59.63%	10.17%	43.13%	16.46%
	本文方法	Qwen2.5	43.57%	57.28%	55.16%	9.06%	42.72%	14.95%

检测模型	增强方式	大模型	FPR	FNR	A	P	R	F1
Devign	—	—	2.54%	97.63%	52.45%	52.27%	3.06%	5.78%
	OSS	—	1.97%	96.94%	52.42%	52.30%	2.37%	4.53%
	SMOTE	—	6.36%	91.76%	52.92%	54.14%	8.24%	14.30%
	VGX	VGX	27.42%	72.04%	51.30%	48.16%	27.96%	35.38%
	VulScribeR	ChatGPT3.5	26.79%	73.56%	50.91%	47.35%	26.44%	33.93%
	VulScribeR	CodeQwen1.5	19.62%	80.26%	51.47%	47.83%	19.74%	27.95%
	本文方法	GLM-4	34.27%	62.69%	52.18%	49.80%	37.31%	42.66%
	本文方法	Qwen2.5	18.28%	80.29%	52.16%	49.56%	19.71%	28.20%
Reveal	—	—	14.35%	84.57%	52.17%	49.48%	15.43%	23.53%
	VGX	VGX	51.45%	48.72%	49.85%	47.59%	51.28%	49.37%
	VulScribeR	ChatGPT3.5	47.81%	52.76%	49.83%	47.37%	47.24%	47.30%
	VulScribeR	CodeQwen1.5	49.10%	50.50%	50.23%	47.88%	49.50%	48.67%
	本文方法	GLM-4	59.55%	35.58%	51.88%	49.64%	64.42%	56.07%
	本文方法	Qwen2.5	55.43%	41.35%	51.28%	49.08%	58.65%	53.44%